3dem / model-angelo

Automatic atomic model building program for cryo-EM maps
MIT License
110 stars 18 forks source link

model-angelo failed on big target #75

Closed wang3702 closed 9 months ago

wang3702 commented 9 months ago

I run model-angelo for a big target but it failed: https://www.emdataresource.org/EMD-11032 with the following error. Could you please have a check to see how to fix it. I encountered similar errors for those big targets. Here I run under the with sequence mode.

2023-09-28 at 10:47:31 | INFO | Loaded module from step: 483863
2023-09-28 at 10:50:31 | ERROR | Error in ModelAngelo
Traceback (most recent call last):

  File "/apps/miniconda38/envs/model_angelo/bin/model_angelo", line 33, in <module>
    sys.exit(load_entry_point('model-angelo==1.0.0', 'console_scripts', 'model_angelo')())
    │   │    └ <function importlib_load_entry_point at 0x7f4d7be50280>
    │   └ <built-in function exit>
    └ <module 'sys' (built-in)>
  File "/apps/miniconda38/envs/model_angelo/lib/python3.10/site-packages/model_angelo-1.0.0-py3.10.egg/model_angelo/__main__.py", line 52, in main
    args.func(args)
    │    │    └ Namespace(volume_path='/home/kihara/wang3702/turtle_scratch/model_angelo_pdrna_complex/DRNA_maps/11032.mrc', protein_fasta='/...
    │    └ <function main at 0x7f4cba9ab0a0>
    └ Namespace(volume_path='/home/kihara/wang3702/turtle_scratch/model_angelo_pdrna_complex/DRNA_maps/11032.mrc', protein_fasta='/...
> File "/apps/miniconda38/envs/model_angelo/lib/python3.10/site-packages/model_angelo-1.0.0-py3.10.egg/model_angelo/apps/build.py", line 241, in main
    gnn_output = gnn_infer(gnn_infer_args)
                 │         └ {'num_rounds': 3, 'crop_length': 200, 'repeat_per_residue': 1, 'esm_model': 'esm1b_t33_650M_UR50S', 'aggressive_pruning': Tru...
                 └ <function infer at 0x7f4cba8f6170>
  File "/apps/miniconda38/envs/model_angelo/lib/python3.10/site-packages/model_angelo-1.0.0-py3.10.egg/model_angelo/gnn/inference.py", line 92, in infer
    protein = get_lm_embeddings_for_protein(lang_model, batch_converter, protein)
              │                             │           │                └ Protein(atom_positions=None, atomc_positions=None, aatype=None, atom_mask=None, atomc_mask=None, residue_index=None, chain_in...
              │                             │           └ <esm.data.BatchConverter object at 0x7f4c6075bf40>
              │                             └ ProteinBertModel(
              │                                 (embed_tokens): Embedding(33, 1280, padding_idx=1)
              │                                 (layers): ModuleList(
              │                                   (0-32): 33 x TransformerLa...
              └ <function get_lm_embeddings_for_protein at 0x7f4cba8f6290>
  File "/apps/miniconda38/envs/model_angelo/lib/python3.10/site-packages/model_angelo-1.0.0-py3.10.egg/model_angelo/data/generate_complete_prot_files.py", line 32, in get_lm_embeddings_for_protein
    [result[s]["representations"][33].cpu().numpy() for s in seq_names], axis=0,
     │                                                       └ ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '2...
     └ {'0': {'representations': {33: tensor([[ 0.0787,  0.0393,  0.2289,  ..., -0.3559, -0.4995, -0.0829],
               [ 0.4218,  0.035...
  File "/apps/miniconda38/envs/model_angelo/lib/python3.10/site-packages/model_angelo-1.0.0-py3.10.egg/model_angelo/data/generate_complete_prot_files.py", line 32, in <listcomp>
    [result[s]["representations"][33].cpu().numpy() for s in seq_names], axis=0,
     │      │                                           └ '87'
     │      └ '87'
     └ {'0': {'representations': {33: tensor([[ 0.0787,  0.0393,  0.2289,  ..., -0.3559, -0.4995, -0.0829],
               [ 0.4218,  0.035...

KeyError: '87'
jamaliki commented 9 months ago

Hi,

This is not due to it being large, I think something is wrong with your sequence file. Do you mind sharing it?

Best, Kiarash.

wang3702 commented 9 months ago

Thank you for your such a quick response! Yes. I found the problem, some protein sequences are missing due to my fault. I will run it again and check the results. Will let you know soon.

wang3702 commented 9 months ago

After I fixed the fasta bugs, it works well. Thank you so much for your help!