bougui505 / alignscape

3 stars 2 forks source link

limit on MSA length? #2

Open Yogesh1-11 opened 5 months ago

Yogesh1-11 commented 5 months ago

hi i am getting following error. is there some limit on protein length?

n_input: 985 opening seq.aln cuda:0 batch_size: 10 sigma: 22.5 alpha: 0.5 seq.aln opened with object id 138143134556656 for worker 0 seq.aln opened with object id 138143134557008 for worker 1

OutOfMemoryError Traceback (most recent call last) in <cell line: 10>() 9 test_aln = '/content/alignscape/data/Human_kinome/human_kinome_noPLK5.aln' 10 if np.genfromtxt('seq.aln').size > 0: ---> 11 align_scape.main(ali="seq.aln", batch_size=10, 12 outname="som", somside=somside, nepochs=nepochs, 13 scheduler="exp", alpha=alpha, sigma= sigma)

2 frames /content/alignscape/quicksom/som.py in call(self, x, learning_rate_op) 317 expanded_x = x.expand(-1, self.grid_size, -1) 318 expanded_weights = self.centroids.unsqueeze(0).expand((batch_size, -1, -1)) --> 319 delta = expanded_x - expanded_weights 320 delta = torch.mul(learning_rate_multiplier.reshape(*learning_rate_multiplier.size(), 1).expand_as(delta), delta) 321

OutOfMemoryError: CUDA out of memory. Tried to allocate 24.74 GiB (GPU 0; 14.75 GiB total capacity; 5.45 GiB already allocated; 9.14 GiB free; 5.48 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Yogesh1-11 commented 5 months ago

warn_deprecated('vmap', 'torch.vmap') 1/100: 0/98500 | alpha: 0.500000 | sigma: 12.500000 | error: 1845.502197 | time: 0:00:32.272844 | eta: 7 days, 8:35:42.745127

bougui505 commented 4 months ago

Thanks for the report. We tried to run the Human kinome dataset on a 4GB GPU with a batch size of 10: apptainer run --nv ./apptainer/alignscape.sif align_scape -a data/Human_kinome/human_kinome_noPLK5.aln -b 10 and the calculation works with an ETA of about 12 minutes. If you can, could you try to run the calculation in apptainer using this sif image file.

The image can be downloaded using the following link on Zenodo platform: https://zenodo.org/records/10417520

Let us know if it helps, Best,

bougui505 commented 4 months ago

By the way, we also changed the implementation of the multiplication: delta = torch.mul(learning_rate_multiplier.reshape(*learning_rate_multiplier.size(), 1).expand_as(delta), delta) by an inplace multiplication to avoid punctual memory duplication of the delta tensor. This should reduce the memory usage.