the problem of long sequences

Bug description When I use esmfold, adding the "-- cpu offload" parameter does not solve the problem of long sequences. My GPU is A100 32GB. Please help me. The error is reported as follows:

22/11/10 13:56:33 | INFO | root | Reading sequences from hipAB.fasta 22/11/10 13:56:33 | INFO | root | Loaded 5 sequences from hipAB.fasta 22/11/10 13:56:33 | INFO | root | Loading model 22/11/10 13:57:23 | INFO | torch.distributed.nn.jit.instantiator | Created a temporary directory at /tmp/tmp4ekwnkqe 22/11/10 13:57:23 | INFO | torch.distributed.nn.jit.instantiator | Writing /tmp/tmp4ekwnkqe/_remote_module_non_scriptable.py 22/11/10 13:57:23 | INFO | torch.distributed.distributed_c10d | Added key: store_based_barrier_key:1 to store for rank: 0 22/11/10 13:57:23 | INFO | torch.distributed.distributed_c10d | Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 22/11/10 13:57:26 | INFO | root | Starting Predictions 22/11/10 13:59:12 | INFO | root | Predicted structure for hipAB-750 with length 765, pLDDT 51.2, pTM 0.264 in 105.6s. 1 / 5 completed. 22/11/10 13:59:15 | INFO | root | Failed (CUDA out of memory) on sequence hipAB-900 of length 918. 22/11/10 13:59:17 | INFO | root | Failed (CUDA out of memory) on sequence hipAB-1050 of length 1071. 22/11/10 13:59:19 | INFO | root | Failed (CUDA out of memory) on sequence hipAB-1200 of length 1224. 22/11/10 13:59:20 | INFO | root | Failed (CUDA out of memory) on sequence hipAB-1350 of length 1377. /home/houj21/miniconda3/envs/esm/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:930: UserWarning: Module is put on CPU and will thus have flattening and sharding run on CPU, which is less efficient than on GPU. We recommend passing in device_id argument which will enable FSDP to put module on GPU device, module must also be on GPU device to work with sync_module_states=True flag which requires GPU communication. "Module is put on CPU and will thus have flattening and sharding"

facebookresearch / esm

the problem of long sequences #355