ayushkarnawat / profit

Exploring evolutionary protein fitness landscapes
MIT License
1 stars 0 forks source link

Pre-trained protein embeddings #5

Open ayushkarnawat opened 4 years ago

ayushkarnawat commented 4 years ago

When preprocessing primary structure sequences (i.e. list of residue name strings), it has been shown that representing proteins in higher dimensional embedding space increases model performance capabilities for various tasks such as mutant stability prediction [1], contact prediction [2][3], homology detection [4], and others [5].

As such, it might be worth it to implement, either from scratch or a publicly-available embedding space, the computation of protein embeddings for sequence-based tasks.

ayushkarnawat commented 4 years ago

To be used primarily with the sequence-based preprocessors/model (i.e. TransformerPreprocessor)

NOTE: Set use_pretrained=True if being used.