When preprocessing primary structure sequences (i.e. list of residue name strings), it has been shown that representing proteins in higher dimensional embedding space increases model performance capabilities for various tasks such as mutant stability prediction [1], contact prediction [2][3], homology detection [4], and others [5].
As such, it might be worth it to implement, either from scratch or a publicly-available embedding space, the computation of protein embeddings for sequence-based tasks.
When preprocessing primary structure sequences (i.e. list of residue name strings), it has been shown that representing proteins in higher dimensional embedding space increases model performance capabilities for various tasks such as mutant stability prediction [1], contact prediction [2][3], homology detection [4], and others [5].
As such, it might be worth it to implement, either from scratch or a publicly-available embedding space, the computation of protein embeddings for sequence-based tasks.