biochunan / CDRConformationClassification

Antibody CDR loop conformation clusteirng using Affinity Propagation
Apache License 2.0
1 stars 1 forks source link

Failed on AbDb file: 1qd0_0H #2

Closed biochunan closed 10 months ago

biochunan commented 10 months ago

Describe the bug cdrclu failed on 1qd0_0H

To Reproduce Steps to reproduce the behavior:

  1. Running environment
    • Version used: 0.1.1
    • Operating system and version: Ubuntu 22.04
  2. Input data: AbDb 1qd0_0H
  3. Running commands:
    $ cd tmp 
    $ cdrclu -db /AbDb -o tmp -c L3 1qd0_0H
  4. Error
    [01/02/24 14:59:25] WARNING  1qd0_0H chain chain_type='H' not exist ...                                                                                                                                                                  app.py:171
    14:59:25 {/home/vscode/.conda/envs/abag_analysis/lib/python3.11/site-packages/cdrclass/examine_abdb_struct.py:282} [ERROR] examine_abdb_struct - pdb1qd0_0H chain_type='L' does not exist ... FAILED [MainThread]
                    WARNING  1qd0_0H chain chain_type='L' not exist ...                                                                                                                                                                  app.py:171
    Traceback (most recent call last):
    File "/home/vscode/.conda/envs/abag_analysis/bin/cdrclu", line 8, in <module>
    sys.exit(app())
             ^^^^^
    File "/home/vscode/.conda/envs/abag_analysis/lib/python3.11/site-packages/cdrclass/app.py", line 930, in app
    main(args=args)
    File "/home/vscode/.conda/envs/abag_analysis/lib/python3.11/site-packages/cdrclass/app.py", line 906, in main
    results, metadata = worker(
                        ^^^^^^^
    File "/home/vscode/.conda/envs/abag_analysis/lib/python3.11/site-packages/cdrclass/app.py", line 779, in worker
    criteria, struct_df, metadata = process_single_mar_file(
                                    ^^^^^^^^^^^^^^^^^^^^^^^^
    File "/home/vscode/.conda/envs/abag_analysis/lib/python3.11/site-packages/cdrclass/app.py", line 176, in process_single_mar_file
    criteria["chain_length_okay"] = assert_seqres_atmseq_length(struct_fp=struct_fp,
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/home/vscode/.conda/envs/abag_analysis/lib/python3.11/site-packages/cdrclass/examine_abdb_struct.py", line 356, in assert_seqres_atmseq_length
    assert len(seq) >= len(atmseq[cid])
                           ~~~~~~^^^^^
    KeyError: 'A'

Expected behavior NA

Screenshots NA

Additional context NA

biochunan commented 10 months ago

Issue Overview: The file pdb1qd0_0H.mar (AbDb version: 20220926) contains chain mapping information that the current parsing logic misinterprets. Specifically, the hapten is being incorrectly processed as if it were a protein chain.

Details: Below is the relevant section from the file that illustrates the chain mapping:

REMARK 950 RESOL crystal, 2.50A/21.00%
REMARK 950 CHAIN-TYPE  LABEL ORIGINAL
REMARK 950 CHAIN H     H     A
SEQRES   1 H  128  GLN VAL GLN LEU GLN GLU SER GLY GLY GLY LEU VAL GLN 
SEQRES   2 H  128  ALA GLY GLY SER LEU ARG LEU SER CYS ALA ALA SER GLY 
SEQRES   3 H  128  ARG ALA ALA SER GLY HIS GLY HIS TYR GLY MET GLY TRP 
SEQRES   4 H  128  PHE ARG GLN VAL PRO GLY LYS GLU ARG GLU PHE VAL ALA 
SEQRES   5 H  128  ALA ILE ARG TRP SER GLY LYS GLU THR TRP TYR LYS ASP 
SEQRES   6 H  128  SER VAL LYS GLY ARG PHE THR ILE SER ARG ASP ASN ALA 
SEQRES   7 H  128  LYS THR THR VAL TYR LEU GLN MET ASN SER LEU LYS GLY 
SEQRES   8 H  128  GLU ASP THR ALA VAL TYR TYR CYS ALA ALA ARG PRO VAL 
SEQRES   9 H  128  ARG VAL ALA ASP ILE SER LEU PRO VAL GLY PHE ASP TYR 
SEQRES  10 H  128  TRP GLY GLN GLY THR GLN VAL THR VAL SER SER 
SEQRES   1 A    1  RR6 
ATOM      1  N   GLN H   1     -18.952  37.800 -10.339  1.00 45.36           N  
ATOM      2  CA  GLN H   1     -18.028  37.324  -9.266  1.00 42.98           C  
ATOM      3  C   GLN H   1     -18.639  36.156  -8.495  1.00 41.75           C  
ATOM      4  O   GLN H   1     -19.862  35.991  -8.475  1.00 41.32           O  

The chain of interest, denoted as H, is followed by the sequence data and the atom coordinates.

Encountered Problem: The error arises when the parser incorrectly treats the hapten indicated in the file as a protein chain, which is not the intended behavior. This issue could lead to inaccurate molecular structure representations and analyses.

Proposed Solution: Need to refine the parsing logic to accurately distinguish between protein chains and non-protein entities such as haptens - 🤔 try biopandas. Implementing a check that differentiates these entities before parsing can prevent such errors.

Additional Information: Attached is a screenshot of 1qd0H_0H for visual reference:

Screenshot of 1dq0H_0H 1qd0_0H

Expected Outcome: Enhance the accuracy of the chain mapping process and ensure that non-protein entities are not misclassified.