OOM on Colab - Githubissues

aiXander commented 3 years ago

When trying to run python extract.py esm1_t34_670M_UR50S examples/P62593.fasta examples/P62593_reprs/ --repr_layers 34 --include mean I get:

tcmalloc: large alloc 2676842496 bytes == 0x548b6000 @  0x7f739765cb6b 0x7f739767c379 0x7f734850974e 0x7f734850b7b6 0x7f7382f74ba5 0x7f7392c2f1d9 0x551555 0x5a9dac 0x50a433 0x50beb4 0x507be4 0x508ec2 0x5a4c61 0x5a4fb8 0x4e012e 0x50a461 0x50beb4 0x507be4 0x588e5c 0x59fd0e 0x50d256 0x507be4 0x509900 0x50a2fd 0x50cc96 0x507be4 0x509900 0x50a2fd 0x50cc96 0x5095c8 0x50a2fd
tcmalloc: large alloc 2676842496 bytes == 0xf418c000 @  0x7f739765cb6b 0x7f739767c379 0x7f734850974e 0x7f734850b7b6 0x7f7382f74ba5 0x7f7392c2f1d9 0x551555 0x5a9dac 0x50a433 0x50beb4 0x507be4 0x508ec2 0x5a4c61 0x5a4fb8 0x4e012e 0x50a461 0x50beb4 0x507be4 0x588e5c 0x59fd0e 0x50d256 0x507be4 0x509900 0x50a2fd 0x50cc96 0x507be4 0x509900 0x50a2fd 0x50cc96 0x5095c8 0x50a2fd

I'm guessing that the Colab GPU (a T4 with 15Gb of mem in my case) is unable to pull the entire model into memory? Anybody else running into this?

aiXander commented 3 years ago

Confirmed, running python extract.py esm1_t6_43M_UR50S examples/some_proteins.fasta examples/representations/ --repr_layers 6 --include mean works perfectly fine!

tomsercu commented 3 years ago

Hi Xander, thanks for calling that out. Did you try reducing eg --toks_per_batch 1022? Most likely the issue is activations in the forward pass, even in no_grad mode (1022, allows +2 for bos/eos on longest sequence)

aiXander commented 3 years ago

Thank you for such a quick reply! python extract.py esm1b_t33_650M_UR50S examples/some_proteins.fasta examples/representations/ --repr_layers 33 --include mean --toks_per_batch 1022 did indeed work!

facebookresearch / esm

OOM on Colab #19