AndrewCRMartin / absplit

Code to split an antibody PDB file into Fv fragments with their antigens
GNU General Public License v3.0
2 stars 1 forks source link

Issues associated with .cho files #8

Closed biochunan closed 2 years ago

biochunan commented 2 years ago
  1. Empty .cho files
    pdb1p7k_0H.cho, pdb1s78_0P.cho, pdb1xct_0P.cho, pdb1ahw_0P.cho
  2. .cho files not found: see below for a list of identifiers
    pdb6e9q_2, pdb6e9q_3, pdb5dt1_0,  pdb4rqq_2, pdb4rqq_1, pdb3mug_1, pdb3mug_4, pdb3mug_3, pdb3mug_0,  pdb5uy3_0
    pdb4toy_0, pdb6uum_1, pdb6v6w_1H, pdb6uuh_0, pdb6uuh_1, pdb5ud9_0, pdb3q6f_0, pdb3q6f_3, pdb3q6f_2,  pdb3q6f_5
    pdb5mp6_0, pdb3mlt_2, pdb3mlt_3,  pdb5w1g_0, pdb6w5a_0, pdb6bjz_0, pdb5ywf_0, pdb5ywf_1, pdb6qd6_1,  pdb6qd6_0, 
    pdb6xv8_0, pdb5wb1_0, pdb6qb4_0,  pdb6qf9_0, pdb6qfc_0, pdb6qb9_1, pdb6qb9_0, pdb6qf9_1, pdb5kov_10, pdb5kov_3, 
    pdb5kov_1, pdb5kov_7, pdb5kov_5,  pdb3wbd_3, pdb3wbd_1, pdb5nm0_2, pdb5d9q_8, pdb5wdu_1, pdb5d9q_2,  pdb5d9q_5, 
    pdb5wdu_5, pdb5gru_3, pdb2znx_1,  pdb2znx_3, pdb6y1r_3, pdb1i3v_0, pdb6y0e_1, pdb6y0e_0, pdb4b50_0
  3. .cho files that where neither Heavy nor Light chains are present
    pdb6e9q_1P,  pdb6e9q_0P,  pdb6vtt_0PH, pdb4rqq_0P,   pdb3mug_5PH, pdb3mug_2PH, pdb5uy3_1P,  pdb5uty_0PH, pdb5u7o_0PH, 
    pdb5fyj_0PH, pdb5fyl_0PH, pdb4tvp_1PH, pdb5wdu_11PH, pdb6de7_0PH, pdb5utf_1PH, pdb5fyk_0PH, pdb5u7m_0PH, pdb5wdu_9PH, 
    pdb6uum_0,   pdb6v6w_3H,  pdb6utk_1PH, pdb6chb_0P,   pdb6ch9_1PH, pdb6chb_3P,  pdb6ch8_1PH, pdb6chb_5P,  pdb6ch7_1PH, 
    pdb3q6f_4P,  pdb3q6f_1P,  pdb5mp6_1P,  pdb3mlt_1P,   pdb3mls_3P,  pdb3mlr_0P,  pdb3mls_1P,  pdb3mls_0P,  pdb3mlt_0P, 
    pdb3mls_2P,  pdb5w1m_1P,  pdb5w1m_2P,  pdb5w1m_3P,   pdb5w1m_0P,  pdb6w9g_0P,  pdb6w9g_2P,  pdb6w9g_1P,  pdb5ywp_0P,  
    pdb5ywp_1P,  pdb6qd6_2P,  pdb6xv8_1P,  pdb7kqy_1P,   pdb5vm4_4P,  pdb5vm4_1P,  pdb5vm4_0P,  pdb5kov_8P,  pdb3lrh_2P,  
    pdb3lrh_3P,  pdb3lrh_0P,  pdb2znx_0PH, pdb2znx_2PH,  pdb6y1r_0P,  pdb6y1r_4P,  pdb6y1r_1P,  pdb6y1r_2P,  pdb1i3v_1P,  
    pdb1i3u_0H,  pdb6xzf_0P,  pdb7aej_0P
  4. .cho file where Length SEQRES < ATMSEQ:
    pdb1dee_2 (both HL), pdb1dee_1P (both HL),
    pdb3j42_1 (L), pdb3j42_2P (L), pdb3j42_0P (L)
    e.g. pdb1dee_2.cho alignment with original chain

pdb1dee_2.cho_H QVQLVESGGGVVQPGKSLRLSCAASGFTFSGYGMHWVRQAPGKGLEWVALISYDESNKYY 1dee_F QVQLVESGGGVVQPGKSLRLSCAASGFTFSGYGMHWVRQAPGKGLEWVALISYDESNKYY


pdb1dee_2.cho_H ADSVKGRFTISRDNSKNTLYLQMNSLRAEDTAVYYCAKVKFYDPTAPNDYWGQGTLVTVS 1dee_F ADSVKGRFTISRDNSKNTLYLQMNSLRAEDTAVYYCAKVKFYDPTAPNDYWGQGTLVTVS


pdb1dee_2.cho_H QVQLVES------GGGVVQPGKSLRLSCAASGFTFSGYGMHWVRQAPGKGLEWVALISYD 1dee_F SGSASAPTLFPLVSCENSNPSSTVAVGCLAQDFLPDSITFSWKYKNNSDISSTRGFPSVL . . . :..:: :. .. .. : : .. . .:

pdb1dee_2.cho_H ESNKYYADSVKGRFTISRDNSKNTLYLQMNSLRAEDTAVYYCAKVKFYDPTAPNDYWGQG 1dee_F RGGKYAATS--------------QVLLPSKDVAQGTNEHVVCKV-Q--H---PNGNKEKD ... : :.: . : . . :.

pdb1dee_2.cho_H TLVTVS 1dee_F VPL--- . :

- `pdb1dee_2.cho chain L` vs. original `1dee chain E`

CLUSTAL O(1.2.3) multiple sequence alignment

pdb1dee_2.cho_L DIQMTQSPSSLSASVGDRVTITCRTSQSISSYLNWYQQKPGKAPKLLIYAASSLQSGVPS 1dee_E DIQMTQSPSSLSASVGDRVTITCRTSQSISSYLNWYQQKPGKAPKLLIYAASSLQSGVPS


pdb1dee_2.cho_L RFSGSGSGTDFTLTISSLQPEDFATYYCQQSYSAPRTFGQGTKVEIKRTDIQMTQ---SP 1dee_E RFSGSGSGTDFTLTISSLQPEDFATYYCQQSYSAPRTFGQGTKVEIKRTVAAPSVFIFPP ***** : *

pdb1dee_2.cho_L SSLSASVGDRVTITCRTSQSISSYLNWYQQKPGKAPKLLIYAASSLQSGVPSRFSGSGSG 1dee_E SDEQLKSG-TASVVCL-------LNNFYPREAK----VQWKVDNALQSGNSQESVTEQDS . . . .::. :* :: : . .:**** .. . ..

pdb1dee_2.cho_L T------DFTLTISSLQPEDFATYYCQQSY---SA--PRTFGQGTKVEIKRT 1dee_E KDSTYSLSSTLTLSKADYEKHKVYACEVTHQGLSSPVTKSFNRGEC------ . . **:. : .. . : :: : ::.:


The 2nd half of heavy and light chains cannot align with the original chain, does it suggest some issues with the trauncation process?

</details>

5. special case: 4hjj_1P 

[4HJJ](https://www.rcsb.org/structure/4HJJ)  is a **Dual Variable Domain** Immunoglobulin (DVD-Ig)

pdb4hjj_1P.cho variable domain CDR is not interacting with the antigen. 

In the original PDB file of 4HJJ, based on its structure, **H126-H246** and **L114-L224** looks like the second variable domain and is interacting with the antigen, is the correct variable domain to trauncate. 
AndrewCRMartin commented 2 years ago

Duplicate of #6 and #7 except for the 4hjj special case now opened as #9