Graylab / IgFold

Fast, accurate antibody structure prediction from deep learning on massive set of natural antibodies
Other
319 stars 60 forks source link

abnumber.exceptions.ChainParseError: Found 2 antibody domains in sequence: #51

Closed Demoyhy closed 1 year ago

Demoyhy commented 1 year ago

Hello, I met a problem when I was forecasting again. I entered the sequence QMKLMQSGGVMVRPGESATLSCVASGFDFSRNGFEWLRQGPGKGLQWLATVTFESKTHVTASARGRFTISRDNSRRTVYLQMTNLQPDDTAMYFCVKDQTIFHKNGAVDFFSYFDLWGRGAPVIVSAXDVVMTQSPEFLAVSLGERATLECKSSHSLLYAPYDKDALVWYQQKPGQPPKLLLDWASSRRSGVSDRFSATSASGRYFTLTISNFRADDVATYYCQQTRWTPPTFGGGTKVDLNX, but it was wrong when running. Loading 4 IgFold models... Using device: cuda:0 Loading /home/yanghy/anaconda3/envs/IgFold/lib/python3.8/site-packages/igfold/trained_models/IgFold/igfold_1.ckpt... Loading /home/yanghy/anaconda3/envs/IgFold/lib/python3.8/site-packages/igfold/trained_models/IgFold/igfold_2.ckpt... Loading /home/yanghy/anaconda3/envs/IgFold/lib/python3.8/site-packages/igfold/trained_models/IgFold/igfold_3.ckpt... Loading /home/yanghy/anaconda3/envs/IgFold/lib/python3.8/site-packages/igfold/trained_models/IgFold/igfold_5.ckpt... Successfully loaded 4 IgFold models. Loaded AntiBERTy model. Traceback (most recent call last): File "/data0/yanghy/workplace/igfold/perdiction_1.py", line 12, in igfold.fold( File "/home/yanghy/anaconda3/envs/IgFold/lib/python3.8/site-packages/igfold/IgFoldRunner.py", line 106, in fold model_out = fold( File "/home/yanghy/anaconda3/envs/IgFold/lib/python3.8/site-packages/igfold/utils/folding.py", line 221, in fold process_prediction( File "/home/yanghy/anaconda3/envs/IgFold/lib/python3.8/site-packages/igfold/utils/folding.py", line 139, in process_prediction renumberpdb( File "/home/yanghy/anaconda3/envs/IgFold/lib/python3.8/site-packages/igfold/utils/abnumber.py", line 59, in renumber_pdb abnum_chain = Chain(seq, scheme=scheme) File "/home/yanghy/anaconda3/envs/IgFold/lib/python3.8/site-packages/abnumber/chain.py", line 98, in init raise ChainParseError(f'Found {len(results)} antibody domains in sequence: "{sequence}"') abnumber.exceptions.ChainParseError: Found 2 antibody domains in sequence: "QMKLMQSGGVMVRPGESATLSCVASGFDFSRNGFEWLRQGPGKGLQWLATVTFESKTHVTASARGRFTISRDNSRRTVYLQMTNLQPDDTAMYFCVKDQTIFHKNGAVDFFSYFDLWGRGAPVIVSADVVMTQSPEFLAVSLGERATLECKSSHSLLYAPYDKDALVWYQQKPGQPPKLLLDWASSRRSGVSDRFSATSASGRYFTLTISNFRADDVATYYCQQTRWTPPTFGGGTKVDLN" ,,,,It seems that this sequence was divided into two fields.I would like to ask you how to solve this problem.

jeffreyruffolo commented 1 year ago

Hello, it looks like you are trying to pass in too much of the antibody sequence. Specifically, here it looks like your sequence is ~240 residues, so I would guess it is the variable fragment plus the first conserved domain.

For IgFold, the expected input is just the variable fragment. If I truncated the sequence to QMKLMQSGGVMVRPGESATLSCVASGFDFSRNGFEWLRQGPGKGLQWLATVTFESKTHVTASARGRFTISRDNSRRTVYLQMTNLQPDDTAMYFCVKDQTIFHKNGAVDFFSYFDLWGRGAPVIVS it looks reasonable.

Demoyhy commented 1 year ago

Thank you for your answer, it is indeed done in this way. I also want to ask you a question, is igfold can be used to predict the structure of antigens? Do you have any suggestions for this