Speed up via xformers - Githubissues

mortonjt commented 1 year ago

Just in case you weren't familiar with this, there is an xformers library that can allow for a >4x speed up on all transformer operations https://github.com/facebookresearch/xformers

Could be low hanging fruit to speed up the operations in this library :)

mheinzinger commented 1 year ago

Hi Jamie,

thanks for reaching out! - I wanted to try this before answering but obviously it took me way too long. I already gave it a shot few weeks ago but failed to reach some significant speed-up but maybe I did something wrong (used it for translation on the new ProstT5 model).

Have you made positive experience with this using some protein language models?

mortonjt commented 1 year ago

Sorry to hear about that. I haven't tried this out yet for protein LLMs (only tested it out on stable-diffusion), but it is on my radar. Hoping that it could be useful for inference and speed up the embedding calculations (which we're noticing is a bottleneck for protein annotation)

mheinzinger commented 1 year ago

Hm, how many proteins are you trying to label? - From my experience ProtT5-XL-U50 encoder-only in half-precision using batching as described here reaches around 0.1s/protein on average for the 20k proteins human (so around 30m for human).

mheinzinger commented 1 year ago

I had a brief look and I stopped once I hit the following error: AttributeError: 'FeatureExtractionPipeline' object has no attribute 'enable_xformers_memory_efficient_attention' (tried to extract embeddings from the ProtT5-XL-U50-fp16 model from my link in the post above). So not sure whether it is as easily plug-n-play as I had hoped. In case you find some example/tutorial that shows how this should be done for plain Transformers (no diffusion etc), pls send by and I can give it a try. So far, I only found tutorials on how to use this on diffusion models in huggingface (but most likely I just missed the right source)

mortonjt commented 1 year ago

Regarding examples, I first saw xformers being used in https://github.com/Stability-AI/stablediffusion — so yes I only saw this used in diffusion models

We were trying to embed all of uniref at one point, but had to resort to just a subset. Were trying to embed proteins in microbial metagenomes, and those reference databases are often >50M proteins

On Mon, Aug 28, 2023 at 10:02 AM Michael Heinzinger < @.***> wrote:

I had a brief look and I stopped once I hit the following error: AttributeError: 'FeatureExtractionPipeline' object has no attribute 'enable_xformers_memory_efficient_attention' (tried to extract embeddings from the ProtT5-XL-U50-fp16 model from my link in the post above). So not sure whether it is as easily plug-n-play as I had hoped. In case you find some example/tutorial that shows how this should be done for plain Transformers (no diffusion etc), pls send by and I can give it a try. So far, I only found tutorials on how to use this on diffusion models in huggingface (but most likely I just missed the right source)

— Reply to this email directly, view it on GitHub https://github.com/agemagician/ProtTrans/issues/120#issuecomment-1695226820, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA75VXNWFRJVOGIAQY2OB73XXRGCZANCNFSM6AAAAAAZGF66MI . You are receiving this because you authored the thread.Message ID: @.***>

mheinzinger commented 1 year ago

Yeah, I see your point. We also ran UniRef50 at one point but only to make predictions, not for embedding extraction (esp. as storing those embeddings becomes expensive quickly). Only things I can recommend (probably obvious but still):

sort & process sequences by length to avoid padding.
batch-processing & half-precision to max-out batch sizes
If you write chunks of proteins (after sorting by length), e.g., splitting UniRef50 into 50 chunks á 1M proteins, you can parallelize embedding extraction on multiple GPUs
Set a upper length limit. Long proteins are the main reason for slow-down. If you only remove proteins longer than the AlphaFold-DB length limit (>1280 residues), you can already reduce the average embedding time from 0.1s/protein to 0.035s/protein for the human proteome while loosing only 5% of the data (19k/20k human proteins are <1280 residues)
Maybe check TensorRT T5-example (no experience, though)

agemagician / ProtTrans

Speed up via xformers #120