DeepChainBio / bio-transformers

bio-transformers is a wrapper on top of the ESM/Protbert model, trained on millions on proteins and used to predict embeddings.
https://bio-transformers.readthedocs.io/en/latest/getting_started/install.html
Apache License 2.0
143 stars 31 forks source link

Add "full" pool_mode #11

Closed KevinEloff closed 3 years ago

KevinEloff commented 3 years ago

Add option to select "full" as a pool_mode for ProtBert embeddings.

The "full" option returns the full sequence of embeddings, rather than a reduced version using mean, cls, max, etc. The returned shape when using pool_mode=["full"] is (num_seqs, seq_size, emb_size)

Note: currently when using full, all sequences need to be of the same length.

KevinEloff commented 3 years ago

We should have a test at the beginning of the function to be sure that all sequences have the same length. If sequences have different lengths, it will raise an error.

Checks added to compute_embeddings function. Now when pool_mode is full, all items in sequence_list must be the same length.