Graylab / IgFold

Fast, accurate antibody structure prediction from deep learning on massive set of natural antibodies
Other
327 stars 61 forks source link

Input two sequences (H and L), but only predict one H structure #6

Closed leiqian-nmsu closed 2 years ago

leiqian-nmsu commented 2 years ago

Dear users, Could I ask you a question? Thanks! I installed igfold locally and ran the antibody and nanobody tests (from github) quite well.

When I ran my prediction test on CH67 antibody, firstly I mistakenly typed H as L, and L as H, certainly its H and L structures were wrong. After that I quickly realized this mistake, and changed the chain names back to normal. However, after my correction, igfold becomes quite werid: it always predicts H structure and misses L structure. (I tried to use igfold to predict other antibody structure, and it worked well for their both H and L)

This issue makes me confused, and I think perhaps igfold might get trained when I made the above chain name typo? My script is as follows, thanks! from igfold import IgFoldRunner, init_pyrosetta init_pyrosetta() sequences = { "H": "QVQLVQSGAEVRKPGASVKVSCKASGYTFTDNYIHWVRQAPGQGLEWMGWIHPNSGATKYAQKFEGWVTMTRDTSISTVYMELSRSRSDDTAVYYCARAGLEPRSVDYYFYGLDVWGQGTAVTVSS", "L": "QSALTQPPSVSVAPGQTATITCGGNNIGRKRVDWFQQKPGQAPVLVVYERFSDSNSGTTATLTISRVEAGDEADYYCQVWDSDSDHVVFGGGTKLTVL" } pred_pdb = "CH67.pdb"

igfold = IgFoldRunner() igfold.fold( pred_pdb, sequences=sequences, do_refine=True, do_renum=True, )

jeffreyruffolo commented 2 years ago

Hello, could you share the PDB file that you get when you run IgFold on your inputs? The code looks reasonable to me, so it might help with debugging to see what the output looks like.

leiqian-nmsu commented 2 years ago

Hi Jeffrey, thank you for your kind reply. Here is my pdb file and fasta file (I change their types both to txt for uploading file type requirement) CH67_fasta.txt CH67_pdb.txt As you can see, fasta file shows both chains, but pdb file only shows one chain (H). Thanks!

jeffreyruffolo commented 2 years ago

Thanks for sending these. It appears the second chain is getting removed by the AbNum server that we use to renumber the PDB. This can sometime happen if AbNum doesn't recognize one of the chains as an antibody. If you set do_renum=False you should get a result with both chains.

If you need renumbering, I would recommend using the ANARCI tool or any other renumbering software. I'm hoping to add ANARCI to the IgFold repo soon, as it is less prone to failure than AbNum.