AndrewCRMartin / absplit

Code to split an antibody PDB file into Fv fragments with their antigens
GNU General Public License v3.0
2 stars 1 forks source link

Chothia files empty or missing #6

Closed AndrewCRMartin closed 1 year ago

AndrewCRMartin commented 2 years ago

Empty:

pdb1p7k_0H.cho
pdb1s78_0P.cho
pdb1xct_0P.cho
pdb1ahw_0P.cho

Missing:

pdb6e9q_2
pdb6e9q_3
pdb5dt1_0
pdb4rqq_2
pdb4rqq_1
pdb3mug_1
pdb3mug_4
pdb3mug_3
pdb3mug_0
pdb5uy3_0
pdb4toy_0
pdb6uum_1
pdb6v6w_1H
pdb6uuh_0
pdb6uuh_1
pdb5ud9_0
pdb3q6f_0
pdb3q6f_3
pdb3q6f_2
pdb3q6f_5
pdb5mp6_0
pdb3mlt_2
pdb3mlt_3
pdb5w1g_0
pdb6w5a_0
pdb6bjz_0
pdb5ywf_0
pdb5ywf_1
pdb6qd6_1
pdb6qd6_0
pdb6xv8_0
pdb5wb1_0
pdb6qb4_0
pdb6qf9_0
pdb6qfc_0
pdb6qb9_1
pdb6qb9_0
pdb6qf9_1
pdb5kov_10
pdb5kov_3
pdb5kov_1
pdb5kov_7
pdb5kov_5
pdb3wbd_3
pdb3wbd_1
pdb5nm0_2
pdb5d9q_8
pdb5wdu_1
pdb5d9q_2
pdb5d9q_5
pdb5wdu_5
pdb5gru_3
pdb2znx_1
pdb2znx_3
pdb6y1r_3
pdb1i3v_0
pdb6y0e_1
pdb6y0e_0
pdb4b50_0

Present but no H/L chains:

pdb6e9q_1P
pdb6e9q_0P
pdb6vtt_0PH
pdb4rqq_0P
pdb3mug_5PH
pdb3mug_2PH
pdb5uy3_1P
pdb5uty_0PH
pdb5u7o_0PH
pdb5fyj_0PH
pdb5fyl_0PH
pdb4tvp_1PH
pdb5wdu_11PH
pdb6de7_0PH
pdb5utf_1PH
pdb5fyk_0PH
pdb5u7m_0PH
pdb5wdu_9PH
pdb6uum_0
pdb6v6w_3H
pdb6utk_1PH
pdb6chb_0P
pdb6ch9_1PH
pdb6chb_3P
pdb6ch8_1PH
pdb6chb_5P
pdb6ch7_1PH
pdb3q6f_4P
pdb3q6f_1P
pdb5mp6_1P
pdb3mlt_1P
pdb3mls_3P
pdb3mlr_0P
pdb3mls_1P
pdb3mls_0P
pdb3mlt_0P
pdb3mls_2P
pdb5w1m_1P
pdb5w1m_2P
pdb5w1m_3P
pdb5w1m_0P
pdb6w9g_0P
pdb6w9g_2P
pdb6w9g_1P
pdb5ywp_0P
pdb5ywp_1P
pdb6qd6_2P
pdb6xv8_1P
pdb7kqy_1P
pdb5vm4_4P
pdb5vm4_1P
pdb5vm4_0P
pdb5kov_8P
pdb3lrh_2P
pdb3lrh_3P
pdb3lrh_0P
pdb2znx_0PH
pdb2znx_2PH
pdb6y1r_0P
pdb6y1r_4P
pdb6y1r_1P
pdb6y1r_2P
pdb1i3v_1P
pdb1i3u_0H
pdb6xzf_0P
pdb7aej_0P
AndrewCRMartin commented 2 years ago

Many of these are now handled fine.

The majority of the rest seem to be cases where the numbering fails.

AndrewCRMartin commented 2 years ago

The following still fail in numbering:

pdb1i3v_0.faa    pdb3q6f_0.faa   pdb4toy_0.faa    pdb5kov_3.faa   pdb5wb1_0.faa    pdb6qb4_0.faa   pdb6v6w_2PH.faa
pdb1i3v_1.faa    pdb3q6f_1.faa   pdb5d9q_2.faa    pdb5kov_4P.faa  pdb5wdu_1.faa    pdb6qb9_0.faa   pdb6v6w_3H.faa
pdb2znx_1.faa    pdb3q6f_2.faa   pdb5d9q_3PH.faa  pdb5kov_5.faa   pdb5wdu_2P.faa   pdb6qb9_1.faa   pdb6w5a_0.faa
pdb2znx_2PH.faa  pdb3q6f_3.faa   pdb5d9q_4PH.faa  pdb5kov_6P.faa  pdb5wdu_3.faa    pdb6qd6_0.faa   pdb6xv8_0.faa
pdb2znx_3.faa    pdb3q6f_4.faa   pdb5d9q_5.faa    pdb5kov_7.faa   pdb5wdu_4P.faa   pdb6qd6_1.faa   pdb6xv8_1.faa
pdb3mlt_2.faa    pdb3q6f_5.faa   pdb5d9q_6PH.faa  pdb5kov_8P.faa  pdb5wdu_5.faa    pdb6qd6_2.faa   pdb6y0e_0.faa
pdb3mlt_3.faa    pdb3wbd_1.faa   pdb5d9q_7PH.faa  pdb5kov_9P.faa  pdb5wdu_6PH.faa  pdb6qf9_0.faa   pdb6y0e_1.faa
pdb3mug_0.faa    pdb3wbd_2H.faa  pdb5d9q_8.faa    pdb5mp6_0.faa   pdb5wdu_7PH.faa  pdb6qf9_1.faa   pdb6y1r_0.faa
pdb3mug_1.faa    pdb3wbd_3.faa   pdb5dt1_0.faa    pdb5mp6_1P.faa  pdb5wdu_8PH.faa  pdb6qfc_0.faa   pdb6y1r_1.faa
pdb3mug_2H.faa   pdb4b50_0.faa   pdb5gru_3.faa    pdb5ud9_0.faa   pdb5wdu_9PH.faa  pdb6uuh_0.faa   pdb6y1r_2.faa
pdb3mug_3.faa    pdb4rqq_0.faa   pdb5kov_10.faa   pdb5uy3_0.faa   pdb5ywf_0.faa    pdb6uuh_1.faa   pdb6y1r_3.faa
pdb3mug_4.faa    pdb4rqq_1.faa   pdb5kov_1.faa    pdb5uy3_1.faa   pdb5ywf_1.faa    pdb6uum_1.faa   pdb6y1r_4.faa
pdb3mug_5H.faa   pdb4rqq_2.faa   pdb5kov_2P.faa   pdb5w1g_0.faa   pdb6bjz_0.faa    pdb6v6w_1H.faa

Some of these fail in abnum:


pdb1i3v_0.faa
# Error: Unable to number sequence #1: failed assignment of segment(s) HFR2_End
pdb1i3v_1.faa
# Error: Unable to number sequence #1: failed assignment of segment(s) HFR2_End
pdb3mlt_2.faa
# Error: Unable to number sequence #1: region lengths are out of bounds.
pdb3mlt_3.faa
# Error: Unable to number sequence #1: region lengths are out of bounds.
pdb4b50_0.faa
# Error: Unable to number sequence #1: region lengths are out of bounds.
pdb5mp6_0.faa
# Error: Unable to number sequence #1: region lengths are out of bounds.
pdb5mp6_1P.faa
# Error: Unable to number sequence #1: region lengths are out of bounds.
pdb5w1g_0.faa
# Error: Unable to number sequence #1: failed assignment of segment(s) LFR1_End
pdb6qd6_0.faa
# Error: Unable to number sequence #1: failed assignment of segment(s) HFR2_End
pdb6qd6_1.faa
# Error: Unable to number sequence #1: failed assignment of segment(s) HFR2_End
pdb6qd6_2.faa
# Error: Unable to number sequence #1: failed assignment of segment(s) HFR2_End
pdb6xv8_0.faa
# Error: Unable to number sequence #1: failed assignment of segment(s) HFR2_End
pdb6xv8_1.faa
# Error: Unable to number sequence #1: failed assignment of segment(s) HFR2_End
pdb6y0e_0.faa
# Error: Unable to number sequence #1: failed assignment of segment(s) HFR2_Start
pdb6y0e_1.faa
# Error: Unable to number sequence #1: failed assignment of segment(s) HFR2_Start
pdb6y1r_0.faa
# Error: Unable to number sequence #1: failed assignment of segment(s) HFR2_End
pdb6y1r_1.faa
# Error: Unable to number sequence #1: failed assignment of segment(s) HFR2_End
pdb6y1r_2.faa
# Error: Unable to number sequence #1: failed assignment of segment(s) HFR2_End
pdb6y1r_3.faa
# Error: Unable to number sequence #1: failed assignment of segment(s) HFR2_End
pdb6y1r_4.faa
# Error: Unable to number sequence #1: failed assignment of segment(s) HFR2_End
pdb1i3u_0H.faa
# Error: Unable to number sequence #1: failed assignment of segment(s) HFR2_End
pdb3mlr_0P.faa
# Error: Unable to number sequence #1: region lengths are out of bounds.
pdb3mls_0P.faa
# Error: Unable to number sequence #1: region lengths are out of bounds.
pdb3mls_1P.faa
# Error: Unable to number sequence #1: region lengths are out of bounds.
pdb3mls_2P.faa
# Error: Unable to number sequence #1: region lengths are out of bounds.
pdb3mls_3P.faa
# Error: Unable to number sequence #1: region lengths are out of bounds.
pdb3mlt_0P.faa
# Error: Unable to number sequence #1: region lengths are out of bounds.
pdb3mlt_1P.faa
# Error: Unable to number sequence #1: region lengths are out of bounds.
pdb5nm0_2.faa
# Error: Unable to number sequence #1: failed assignment of segment(s) HFR2_End HFR3_End 
pdb5w1m_0P.faa
# Error: Unable to number sequence #1: failed assignment of segment(s) LFR1_End
pdb5w1m_1P.faa
# Error: Unable to number sequence #1: failed assignment of segment(s) LFR1_End
pdb5w1m_2P.faa
# Error: Unable to number sequence #1: failed assignment of segment(s) LFR1_End
pdb5w1m_3P.faa
# Error: Unable to number sequence #1: failed assignment of segment(s) LFR1_End
pdb6xzf_0P.faa
# Error: Unable to number sequence #1: failed assignment of segment(s) HFR2_Start
pdb7aej_0P.faa
# Error: Unable to number sequence #1: region lengths are out of bounds.

The rest are fails in pdbabnum rather than abnum itself.

pdb2znx_1.faa   pdb2znx_2PH.faa pdb2znx_3.faa   pdb3mug_0.faa   pdb3mug_1.faa
pdb3mug_2H.faa  pdb3mug_3.faa   pdb3mug_4.faa   pdb3mug_5H.faa  pdb3q6f_0.faa
pdb3q6f_1.faa   pdb3q6f_2.faa   pdb3q6f_3.faa   pdb3q6f_4.faa   pdb3q6f_5.faa
pdb3wbd_1.faa   pdb3wbd_2H.faa  pdb3wbd_3.faa   pdb4rqq_0.faa   pdb4rqq_1.faa
pdb4rqq_2.faa   pdb4toy_0.faa   pdb5d9q_2.faa   pdb5d9q_3PH.faa pdb5d9q_4PH.faa
pdb5d9q_5.faa   pdb5d9q_6PH.faa pdb5d9q_7PH.faa pdb5d9q_8.faa   pdb5dt1_0.faa
pdb5gru_3.faa   pdb5kov_10.faa  pdb5kov_1.faa   pdb5kov_2P.faa  pdb5kov_3.faa
pdb5kov_4P.faa  pdb5kov_5.faa   pdb5kov_6P.faa  pdb5kov_7.faa   pdb5kov_8P.faa
pdb5kov_9P.faa  pdb5ud9_0.faa   pdb5uy3_0.faa   pdb5uy3_1.faa   pdb5wb1_0.faa
pdb5wdu_1.faa   pdb5wdu_2P.faa  pdb5wdu_3.faa   pdb5wdu_4P.faa  pdb5wdu_5.faa
pdb5wdu_6PH.faa pdb5wdu_7PH.faa pdb5wdu_8PH.faa pdb5wdu_9PH.faa pdb5ywf_0.faa
pdb5ywf_1.faa   pdb6bjz_0.faa   pdb6qb4_0.faa   pdb6qb9_0.faa   pdb6qb9_1.faa
pdb6qf9_0.faa   pdb6qf9_1.faa   pdb6qfc_0.faa   pdb6uuh_0.faa   pdb6uuh_1.faa
pdb6uum_1.faa   pdb6v6w_1H.faa  pdb6v6w_2PH.faa pdb6v6w_3H.faa  pdb6w5a_0.faa

Some of those that fail are because pdbabnum doesn't deal with single chains properly - e.g. 6qfc_0

AndrewCRMartin commented 2 years ago

Now OK:

pdb1ahw_0P.faa  pdb1s78_0P.faa  pdb3lrh_0P.faa  pdb3wbd_0H.faa  pdb5vm4_3.faa  pdb5vm4_7.faa  pdb7kqy_2.faa
pdb1ahw_1P.faa  pdb1s78_1P.faa  pdb3lrh_1P.faa  pdb5vm4_0.faa   pdb5vm4_4.faa  pdb5vm4_8.faa
pdb1p7k_0H.faa  pdb1xct_0P.faa  pdb3lrh_2P.faa  pdb5vm4_1.faa   pdb5vm4_5.faa  pdb7kqy_0.faa
pdb1p7k_1H.faa  pdb1xct_1P.faa  pdb3lrh_3P.faa  pdb5vm4_2.faa   pdb5vm4_6.faa  pdb7kqy_1.faa

Now bad PDB codes that need fixing:

1i3u  1i3v  2znx  3mlr  3mls  3mlt  3mug  3q6f  3wbd  4b50
4rqq  4toy  4tvp  5d9q  5dt1  5fyj  5fyk  5fyl  5gru  5kov
5mp6  5nm0  5u7m  5u7o  5ud9  5utf  5uty  5uy3  5w1g  5w1m
5wb1  5wdu  5ywf  5ywp  6bjz  6ch7  6ch8  6ch9  6chb  6de7
6e9q  6qb4  6qb9  6qd6  6qf9  6qfc  6utk  6uuh  6uum  6v6w
6vtt  6w5a  6w9g  6xv8  6xzf  6y0e  6y1r  7aej

Of these, the following are from pdbabnum fails:

2znx  3mug  3q6f  3wbd  4rqq  4toy  5d9q  5dt1  5gru  5kov
5ud9  5uy3  5wb1  5wdu  5ywf  6bjz  6qb4  6qb9  6qf9  6qfc
6uuh  6uum  6v6w  6w5a

The following are from abnum fails:

1i3u  1i3v  3mlr  3mls  3mlt  4b50  5mp6  5nm0  5w1g  5w1m
6qd6  6xv8  6xzf  6y0e  6y1r  7aej

The following fail for unidentified reasons:

4tvp  5fyj  5fyk  5fyl  5u7m  5u7o  5utf  5uty  5ywp  6ch7
6ch8  6ch9  6chb  6de7  6e9q  6utk  6vtt  6w9g
AndrewCRMartin commented 2 years ago

Of the pdbabnum fails:

2znx  3mug  3q6f  3wbd  4rqq  4toy  5d9q  5dt1  5gru  5kov
5ud9  5uy3  5wb1  5wdu  5ywf  6bjz  6qb4  6qb9  6qf9  6qfc
6uuh  6uum  6v6w  6w5a

Non-standard amino acids

Start not found

Length problems

Abnum fails

Multiple fails

AndrewCRMartin commented 2 years ago

Non-standard AAs are now working in dbc3d14972e199040dff646e385bcf46bb781222 for

AndrewCRMartin commented 2 years ago

Current fail list:

1i3u - abnum
1i3v - abnum
2znx - scFv - splitting now works, but problem with numbering owing to pdb2pir breaking! - the alignment doesn't work properly
3mlr - region lengths are out of bounds
3mls    
3mlt
3wbd - FIXED (Start of patch sequence not found within the first 50 residues of the PDB file)
4b50
4rqq - abnum
4toy - abnum
4tvp  - abnum
5d9q - Unable to read patch file **and** Start of patch sequence not found within the first 50 residues of the PDB file
5dt1 - abnum
5fyj  
5fyk  
5fyl  
5gru - Not fixed Start of patch sequence not found within the first 50 residues of the PDB file
5kov - Some abnum fails
5mp6
5nm0  
5u7m  
5u7o  
5ud9 - abnum
5utf  
5uty 
5uy3 - Patch file: Error: Unable to number sequence 2: region lengths are out of bounds.
5w1g
5w1m
5wb1 - FIXED (Start of patch sequence not found within the first 50 residues of the PDB file) but this is a special case
5wdu - abnum
5ywf - pdbpatchnumbering: Unable to read patch file
5ywp   
6ch7  
6ch8  
6ch9  
6chb  
6de7
6e9q  
6qb4 - FIXED (Start of patch sequence not found within the first 50 residues of the PDB file)
6qb9 - FIXED (Start of patch sequence not found within the first 50 residues of the PDB file)
6qd6
6qf9 - FIXED (Start of patch sequence not found within the first 50 residues of the PDB file)
6qfc - FIXED (Start of patch sequence not found within the first 50 residues of the PDB file)
6utk  
6uuh - abnum
6uum - abnum
6v6w - abnum
6vtt   
6w9g   
6xv8
6xzf  
6y0e
6y1r
7aej
AndrewCRMartin commented 1 year ago

Closing this and adding a new summary