3dem / model-angelo

Automatic atomic model building program for cryo-EM maps
MIT License
110 stars 18 forks source link

V1.0 stops after first refinement iteration #44

Closed rfronzes closed 1 year ago

rfronzes commented 1 year ago

Dear ModelAngelo developers,

first, many thanks for the amazing work you are doing !

We have a bug with the new version of ModelAngelo (v1.0).

The prediction runs OK for the Alpha-C and first refinement round. At the end of the first round, we have an error message and the program stops (see below).

Many thanks

Rémi

2023-05-17 at 14:25:23 | INFO | ModelAngelo with args: {'volume_path': 'map.mrc', 'protein_fasta': 'AdhE-SP.fa', 'rna_fasta': None, 'dna_fasta': None, 'output_dir': 'test2', 'mask_path': None, 'device': None, 'config_path': None, 'model_bundle_name': 'nucleotides', 'model_bundle_path': None, 'keep_intermediate_results': False, 'pipeline_control': False, 'func': <function main at 0x7f2c33703d00>} 2023-05-17 at 14:25:23 | INFO | Initial C-alpha prediction with args: {'model_checkpoint': 'chkpt.torch', 'bfactor': 0, 'batch_size': 4, 'box_size': 64, 'stride': 16, 'dont_mask_input': True, 'threshold': 0.05, 'save_real_coordinates': False, 'save_cryo_em_grid': False, 'do_nucleotides': True, 'save_backbone_trace': False, 'save_output_grid': False, 'crop': 6, 'log_dir': '/srv/home/rfuser/.cache/torch/hub/checkpoints/model_angelo_v1.0/nucleotides/c_alpha', 'map_path': 'map.mrc', 'output_path': 'test2/see_alpha_output', 'mask_path': None, 'device': None, 'auto_mask': False} 2023-05-17 at 14:25:24 | INFO | Using model file /srv/home/rfuser/.cache/torch/hub/checkpoints/model_angelo_v1.0/nucleotides/c_alpha/model.py 2023-05-17 at 14:25:24 | INFO | Using checkpoint file /srv/home/rfuser/.cache/torch/hub/checkpoints/model_angelo_v1.0/nucleotides/c_alpha/chkpt.torch 2023-05-17 at 14:25:25 | INFO | Input structure has shape: (186, 186, 186) 2023-05-17 at 14:25:25 | INFO | Running with these arguments: 2023-05-17 at 14:25:25 | INFO | {'model_checkpoint': 'chkpt.torch', 'bfactor': 0, 'batch_size': 4, 'box_size': 64, 'stride': 16, 'dont_mask_input': True, 'threshold': 0.05, 'save_real_coordinates': False, 'save_cryo_em_grid': False, 'do_nucleotides': True, 'save_backbone_trace': False, 'save_output_grid': False, 'crop': 6, 'log_dir': '/srv/home/rfuser/.cache/torch/hub/checkpoints/model_angelo_v1.0/nucleotides/c_alpha', 'map_path': 'map.mrc', 'output_path': 'test2/see_alpha_output', 'mask_path': None, 'device': None, 'auto_mask': False} 2023-05-17 at 14:28:15 | INFO | Model prediction done, took 170.09 seconds for 512 sliding windows 2023-05-17 at 14:28:15 | INFO | Average time is 332.199 ms 2023-05-17 at 14:28:15 | INFO | Starting Cα grid to points... 2023-05-17 at 14:28:16 | INFO | Have 25615 Cα points before pruning and 5318 after pruning 2023-05-17 at 14:28:18 | INFO | Starting P grid to points... 2023-05-17 at 14:28:18 | INFO | Have 1448 P points before pruning and 354 after pruning 2023-05-17 at 14:28:18 | INFO | Finished inference! 2023-05-17 at 14:28:18 | INFO | GNN model refinement round 1 with args: {'num_rounds': 3, 'crop_length': 200, 'repeat_per_residue': 1, 'esm_model': 'esm1b_t33_650M_UR50S', 'aggressive_pruning': True, 'seq_attention_batch_size': 200, 'fp16': False, 'batch_size': 1, 'voxel_size': 1.0, 'map': 'map.mrc', 'protein_fasta': 'AdhE-SP.fa', 'rna_fasta': None, 'dna_fasta': None, 'struct': 'test2/see_alpha_output/see_alpha_merged_output.cif', 'output_dir': 'test2/gnn_output_round_1', 'model_dir': '/srv/home/rfuser/.cache/torch/hub/checkpoints/model_angelo_v1.0/nucleotides/gnn', 'device': None, 'write_hmm_profiles': False, 'refine': False} 2023-05-17 at 14:28:18 | INFO | Loaded module from step: 483863 2023-05-17 at 14:33:48 | ERROR | Error in ModelAngelo Traceback (most recent call last):

File "/app/anaconda3/envs/model_angelo/bin/model_angelo", line 33, in sys.exit(load_entry_point('model-angelo==1.0.0', 'console_scripts', 'model_angelo')()) │ │ └ <function importlib_load_entry_point at 0x7f2d79967d90> │ └ └ <module 'sys' (built-in)> File "/app/anaconda3/envs/model_angelo/lib/python3.10/site-packages/model_angelo-1.0.0-py3.10.egg/model_angelo/main.py", line 52, in main args.func(args) │ │ └ Namespace(volume_path='map.mrc', protein_fasta='AdhE-SP.fa', rna_fasta=None, dna_fasta=None, output_dir='test2', mask_path=No... │ └ <function main at 0x7f2c33703d00> └ Namespace(volume_path='map.mrc', protein_fasta='AdhE-SP.fa', rna_fasta=None, dna_fasta=None, output_dir='test2', mask_path=No...

File "/app/anaconda3/envs/model_angelo/lib/python3.10/site-packages/model_angelo-1.0.0-py3.10.egg/model_angelo/apps/build.py", line 241, in main gnn_output = gnn_infer(gnn_infer_args) │ └ {'num_rounds': 3, 'crop_length': 200, 'repeat_per_residue': 1, 'esm_model': 'esm1b_t33_650M_UR50S', 'aggressive_pruning': Tru... └ <function infer at 0x7f2c33efa9e0> File "/app/anaconda3/envs/model_angelo/lib/python3.10/site-packages/model_angelo-1.0.0-py3.10.egg/model_angelo/gnn/inference.py", line 184, in infer final_results_to_cif( └ <function final_results_to_cif at 0x7f2c33703520> File "/app/anaconda3/envs/model_angelo/lib/python3.10/site-packages/model_angelo-1.0.0-py3.10.egg/model_angelo/gnn/flood_fill.py", line 251, in final_results_to_cif final_results["aa_logits"][existence_mask][c] for c in pruned_chains │ └ array([ True, True, True, ..., True, True, True]) └ {'pred_positions': array([[ 97.94374 , 121.828865, 35.82869 ], [ 93.678734, 123.0471 , 36.359264], [ 98.8719...

NameError: name 'pruned_chains' is not defined

jamaliki commented 1 year ago

Hi,

This is strange, I made a change. Could you update your installation and please try again?

Best, Kiarash.

rfronzes commented 1 year ago

Hi

Unfortunately, it crashes at the same point. Different error message

Many thanks

Rémi


2023-05-17 at 16:11:49 | INFO | ModelAngelo with args: {'volume_path': 'map.mrc', 'protein_fasta': 'AdhE-SP.fa', 'rna_fasta': None, 'dna_fasta': None, 'output_dir': 'test-commit', 'mask_path': None, 'device': None, 'config_path': None, 'model_bundle_name': 'nucleotides', 'model_bundle_path': None, 'keep_intermediate_results': False, 'pipeline_control': False, 'func': <function main at 0x7f30b7b0bc70>} 2023-05-17 at 16:11:49 | INFO | Initial C-alpha prediction with args: {'model_checkpoint': 'chkpt.torch', 'bfactor': 0, 'batch_size': 4, 'box_size': 64, 'stride': 16, 'dont_mask_input': True, 'threshold': 0.05, 'save_real_coordinates': False, 'save_cryo_em_grid': False, 'do_nucleotides': True, 'save_backbone_trace': False, 'save_output_grid': False, 'crop': 6, 'log_dir': '/srv/home/rfuser/.cache/torch/hub/checkpoints/model_angelo_v1.0/nucleotides/c_alpha', 'map_path': 'map.mrc', 'output_path': 'test-commit/see_alpha_output', 'mask_path': None, 'device': None, 'auto_mask': False} 2023-05-17 at 16:11:49 | INFO | Using model file /srv/home/rfuser/.cache/torch/hub/checkpoints/model_angelo_v1.0/nucleotides/c_alpha/model.py 2023-05-17 at 16:11:49 | INFO | Using checkpoint file /srv/home/rfuser/.cache/torch/hub/checkpoints/model_angelo_v1.0/nucleotides/c_alpha/chkpt.torch 2023-05-17 at 16:11:51 | INFO | Input structure has shape: (186, 186, 186) 2023-05-17 at 16:11:51 | INFO | Running with these arguments: 2023-05-17 at 16:11:51 | INFO | {'model_checkpoint': 'chkpt.torch', 'bfactor': 0, 'batch_size': 4, 'box_size': 64, 'stride': 16, 'dont_mask_input': True, 'threshold': 0.05, 'save_real_coordinates': False, 'save_cryo_em_grid': False, 'do_nucleotides': True, 'save_backbone_trace': False, 'save_output_grid': False, 'crop': 6, 'log_dir': '/srv/home/rfuser/.cache/torch/hub/checkpoints/model_angelo_v1.0/nucleotides/c_alpha', 'map_path': 'map.mrc', 'output_path': 'test-commit/see_alpha_output', 'mask_path': None, 'device': None, 'auto_mask': False} 2023-05-17 at 16:14:44 | INFO | Model prediction done, took 172.46 seconds for 512 sliding windows 2023-05-17 at 16:14:44 | INFO | Average time is 336.832 ms 2023-05-17 at 16:14:44 | INFO | Starting Cα grid to points... 2023-05-17 at 16:14:45 | INFO | Have 25615 Cα points before pruning and 5318 after pruning 2023-05-17 at 16:14:46 | INFO | Starting P grid to points... 2023-05-17 at 16:14:47 | INFO | Have 1448 P points before pruning and 354 after pruning 2023-05-17 at 16:14:47 | INFO | Finished inference! 2023-05-17 at 16:14:47 | INFO | GNN model refinement round 1 with args: {'num_rounds': 3, 'crop_length': 200, 'repeat_per_residue': 1, 'esm_model': 'esm1b_t33_650M_UR50S', 'aggressive_pruning': True, 'seq_attention_batch_size': 200, 'fp16': False, 'batch_size': 1, 'voxel_size': 1.0, 'map': 'map.mrc', 'protein_fasta': 'AdhE-SP.fa', 'rna_fasta': None, 'dna_fasta': None, 'struct': 'test-commit/see_alpha_output/see_alpha_merged_output.cif', 'output_dir': 'test-commit/gnn_output_round_1', 'model_dir': '/srv/home/rfuser/.cache/torch/hub/checkpoints/model_angelo_v1.0/nucleotides/gnn', 'device': None, 'write_hmm_profiles': False, 'refine': False} 2023-05-17 at 16:14:47 | INFO | Loaded module from step: 483863 2023-05-17 at 16:20:19 | ERROR | Error in ModelAngelo Traceback (most recent call last):

File "/app/anaconda3/envs/model_angelo/bin/model_angelo", line 33, in sys.exit(load_entry_point('model-angelo==1.0.0', 'console_scripts', 'model_angelo')()) │ │ └ <function importlib_load_entry_point at 0x7f31fdd1fd90> │ └ └ <module 'sys' (built-in)> File "/app/anaconda3/envs/model_angelo/lib/python3.10/site-packages/model_angelo-1.0.0-py3.10.egg/model_angelo/main.py", line 52, in main args.func(args) │ │ └ Namespace(volume_path='map.mrc', protein_fasta='AdhE-SP.fa', rna_fasta=None, dna_fasta=None, output_dir='test-commit', mask_p... │ └ <function main at 0x7f30b7b0bc70> └ Namespace(volume_path='map.mrc', protein_fasta='AdhE-SP.fa', rna_fasta=None, dna_fasta=None, output_dir='test-commit', mask_p...

File "/app/anaconda3/envs/model_angelo/lib/python3.10/site-packages/model_angelo-1.0.0-py3.10.egg/model_angelo/apps/build.py", line 241, in main gnn_output = gnn_infer(gnn_infer_args) │ └ {'num_rounds': 3, 'crop_length': 200, 'repeat_per_residue': 1, 'esm_model': 'esm1b_t33_650M_UR50S', 'aggressive_pruning': Tru... └ <function infer at 0x7f30b82ca950> File "/app/anaconda3/envs/model_angelo/lib/python3.10/site-packages/model_angelo-1.0.0-py3.10.egg/model_angelo/gnn/inference.py", line 184, in infer final_results_to_cif( └ <function final_results_to_cif at 0x7f30b7b0b490> File "/app/anaconda3/envs/model_angelo/lib/python3.10/site-packages/model_angelo-1.0.0-py3.10.egg/model_angelo/gnn/flood_fill.py", line 291, in final_results_to_cif fix_chains_output = fix_chains_pipeline( └ <function fix_chains_pipeline at 0x7f30b7b0add0> File "/app/anaconda3/envs/model_angelo/lib/python3.10/site-packages/model_angelo-1.0.0-py3.10.egg/model_angelo/utils/hmm_sequence_align.py", line 521, in fix_chains_pipeline best_match_output = best_match_to_sequences( └ <function best_match_to_sequences at 0x7f30b7b0a290> File "/app/anaconda3/envs/model_angelo/lib/python3.10/site-packages/model_angelo-1.0.0-py3.10.egg/model_angelo/utils/hmm_sequence_align.py", line 211, in best_match_to_sequences hmm_alignment = get_hmm_alignment( └ <function get_hmm_alignment at 0x7f30b7b0a200> File "/app/anaconda3/envs/model_angelo/lib/python3.10/site-packages/model_angelo-1.0.0-py3.10.egg/model_angelo/utils/hmm_sequence_align.py", line 50, in get_hmm_alignment msas = pyhmmer.hmmer.hmmalign( │ │ └ <function hmmalign at 0x7f30b7b096c0> │ └ <module 'pyhmmer.hmmer' from '/app/anaconda3/envs/model_angelo/lib/python3.10/site-packages/pyhmmer/hmmer.py'> └ <module 'pyhmmer' from '/app/anaconda3/envs/model_angelo/lib/python3.10/site-packages/pyhmmer/init.py'> File "/app/anaconda3/envs/model_angelo/lib/python3.10/site-packages/pyhmmer/hmmer.py", line 1369, in hmmalign traces = aligner.compute_traces(hmm, sequences) │ │ │ └ DigitalSequenceBlock(pyhmmer.easel.Alphabet.amino(), [<pyhmmer.easel.DigitalSequence object at 0x7f30b442c480>]) │ │ └ <pyhmmer.plan7.HMM object at 0x7f3076949440> │ └ <method 'compute_traces' of 'pyhmmer.plan7.TraceAligner' objects> └ TraceAligner() File "pyhmmer/plan7.pyx", line 8440, in pyhmmer.plan7.TraceAligner.compute_traces cpdef Traces compute_traces(self, HMM hmm, DigitalSequenceBlock sequences): │ └ <class 'pyhmmer.plan7.HMM'> └ <class 'pyhmmer.plan7.Traces'> File "pyhmmer/plan7.pyx", line 8480, in pyhmmer.plan7.TraceAligner.compute_traces raise ValueError(f"Invalid HMM: {err_msg}")

ValueError: Invalid HMM: TMD should be 0 for last node

jamaliki commented 1 year ago

I have not seen this before. Could you send me the fasta file you used?

rfronzes commented 1 year ago

it crashes at the same point even without Fasta file. Could it come from the map ? I tested 2 different maps. Same maps and fasta files were working with the previous version of ModelAngelo

jamaliki commented 1 year ago

It crashes without the Fasta file as well? Could you please provide the log file for that run as well?

Are you willing to share the map with me? I need the map and fasta to be able to see what the issue is.

rfronzes commented 1 year ago

Can I send you the map and fasta by Email ?

I tested the build_no_seq again. Now it is working !!

Still not working with the fasta .

martinpacesa commented 1 year ago

I am also getting an error when trying to build with RNA and DNA nucleotides. Previous version of modelangelo ran fine on the same map with just protein:

``2023-05-17 at 18:06:00 | INFO | ModelAngelo with args: {'volume_path': '/local/Maps/cryosparc_P2_J341_003_volume_map.mrc', 'protein_fasta': '/local/seq/test.fasta', 'rna_fasta': '/local/seq/test_RNA.fasta', 'dna_fasta': '/local/seq/test_DNA.fasta', 'output_dir': '.', 'mask_path': None, 'device': None, 'config_path': None, 'model_bundle_name': 'nucleotides', 'model_bundle_path': None, 'keep_intermediate_results': False, 'pipeline_control': False, 'func': <function main at 0x2b48b9c3aaf0>} 2023-05-17 at 18:06:01 | INFO | Initial C-alpha prediction with args: {'model_checkpoint': 'chkpt.torch', 'bfactor': 0, 'batch_size': 4, 'box_size': 64, 'stride': 16, 'dont_mask_input': True, 'threshold': 0.05, 'save_real_coordinates': False, 'save_cryo_em_grid': False, 'do_nucleotides': True, 'save_backbone_trace': False, 'save_output_grid': False, 'crop': 6, 'log_dir': '/local/Pipelines/ModelAngelo/model_angelo_weights/hub/checkpoints/model_angelo_v1.0/nucleotides/c_alpha', 'map_path': '/local/Maps/cryosparc_P2_J341_003_volume_map.mrc', 'output_path': './see_alpha_output', 'mask_path': None, 'device': None, 'auto_mask': False} 2023-05-17 at 18:06:01 | INFO | Using model file /local/Pipelines/ModelAngelo/model_angelo_weights/hub/checkpoints/model_angelo_v1.0/nucleotides/c_alpha/model.py 2023-05-17 at 18:06:01 | INFO | Using checkpoint file /local/Pipelines/ModelAngelo/model_angelo_weights/hub/checkpoints/model_angelo_v1.0/nucleotides/c_alpha/chkpt.torch 2023-05-17 at 18:06:06 | INFO | Input structure has shape: (194, 194, 194) 2023-05-17 at 18:06:06 | INFO | Running with these arguments: 2023-05-17 at 18:06:06 | INFO | {'model_checkpoint': 'chkpt.torch', 'bfactor': 0, 'batch_size': 4, 'box_size': 64, 'stride': 16, 'dont_mask_input': True, 'threshold': 0.05, 'save_real_coordinates': False, 'save_cryo_em_grid': False, 'do_nucleotides': True, 'save_backbone_trace': False, 'save_output_grid': False, 'crop': 6, 'log_dir': '/local/Pipelines/ModelAngelo/model_angelo_weights/hub/checkpoints/model_angelo_v1.0/nucleotides/c_alpha', 'map_path': '/local/Maps/cryosparc_P2_J341_003_volume_map.mrc', 'output_path': './see_alpha_output', 'mask_path': None, 'device': None, 'auto_mask': False} 2023-05-17 at 18:10:32 | INFO | Model prediction done, took 265.69 seconds for 729 sliding windows 2023-05-17 at 18:10:32 | INFO | Average time is 364.460 ms 2023-05-17 at 18:10:32 | INFO | Starting Cα grid to points... 2023-05-17 at 18:10:33 | INFO | Have 15582 Cα points before pruning and 1887 after pruning 2023-05-17 at 18:10:34 | INFO | Starting P grid to points... 2023-05-17 at 18:10:34 | INFO | Have 6515 P points before pruning and 303 after pruning 2023-05-17 at 18:10:35 | INFO | Finished inference! 2023-05-17 at 18:10:35 | INFO | GNN model refinement round 1 with args: {'num_rounds': 3, 'crop_length': 200, 'repeat_per_residue': 1, 'esm_model': 'esm1b_t33_650M_UR50S', 'aggressive_pruning': True, 'seq_attention_batch_size': 200, 'fp16': False, 'batch_size': 1, 'voxel_size': 1.0, 'map': '/local/Maps/cryosparc_P2_J341_003_volume_map.mrc', 'protein_fasta': '/local/seq/test.fasta', 'rna_fasta': '/local/seq/test_RNA.fasta', 'dna_fasta': '/local/seq/test_DNA.fasta', 'struct': './see_alpha_output/see_alpha_merged_output.cif', 'output_dir': './gnn_output_round_1', 'model_dir': '/local/Pipelines/ModelAngelo/model_angelo_weights/hub/checkpoints/model_angelo_v1.0/nucleotides/gnn', 'device': None, 'write_hmm_profiles': False, 'refine': False} 2023-05-17 at 18:10:35 | INFO | Loaded module from step: 483863 2023-05-17 at 18:13:06 | ERROR | Error in ModelAngelo Traceback (most recent call last):

File "/home/pacesa/miniconda3/envs/model_angelo/bin/model_angelo", line 33, in sys.exit(load_entry_point('model-angelo==1.0.0', 'console_scripts', 'model_angelo')()) │ │ └ <function importlib_load_entry_point at 0x2b4809d32280> │ └ └ <module 'sys' (built-in)> File "/home/pacesa/miniconda3/envs/model_angelo/lib/python3.9/site-packages/model_angelo-1.0.0-py3.9.egg/model_angelo/main.py", line 52, in main args.func(args) │ │ └ Namespace(volume_path='/local/Maps/cryosparc_P2_J341_003_volume_map.mrc', protein_fas... │ └ <function main at 0x2b48b9c3aaf0> └ Namespace(volume_path='/local/Maps/cryosparc_P2_J341_003_volume_map.mrc', protein_fas...

File "/home/pacesa/miniconda3/envs/model_angelo/lib/python3.9/site-packages/model_angelo-1.0.0-py3.9.egg/model_angelo/apps/build.py", line 241, in main gnn_output = gnn_infer(gnn_infer_args) │ └ {'num_rounds': 3, 'crop_length': 200, 'repeat_per_residue': 1, 'esm_model': 'esm1b_t33_650M_UR50S', 'aggressive_pruning': Tru... └ <function infer at 0x2b48b8ec4f70> File "/home/pacesa/miniconda3/envs/model_angelo/lib/python3.9/site-packages/model_angelo-1.0.0-py3.9.egg/model_angelo/gnn/inference.py", line 184, in infer final_results_to_cif( └ <function final_results_to_cif at 0x2b48b9c3a8b0> File "/home/pacesa/miniconda3/envs/model_angelo/lib/python3.9/site-packages/model_angelo-1.0.0-py3.9.egg/model_angelo/gnn/flood_fill.py", line 251, in final_results_to_cif final_results["aa_logits"][existence_mask][c] for c in pruned_chains │ └ array([ True, True, True, ..., True, True, True]) └ {'pred_positions': array([[149.68655 , 158.68929 , 80.635506], [152.87912 , 157.20757 , 82.420456], [151.5113...

NameError: name 'pruned_chains' is not defined ``

jamaliki commented 1 year ago

Can I send you the map and fasta by Email ?

I tested the build_no_seq again. Now it is working !!

Still not working with the fasta .

Yes please, email is kjamali@mrc-lmb.cam.ac.uk

jamaliki commented 1 year ago

I am also getting an error when trying to build with RNA and DNA nucleotides. Previous version of modelangelo ran fine on the same map with just protein:

``2023-05-17 at 18:06:00 | INFO | ModelAngelo with args: {'volume_path': '/local/Maps/cryosparc_P2_J341_003_volume_map.mrc', 'protein_fasta': '/local/seq/test.fasta', 'rna_fasta': '/local/seq/test_RNA.fasta', 'dna_fasta': '/local/seq/test_DNA.fasta', 'output_dir': '.', 'mask_path': None, 'device': None, 'config_path': None, 'model_bundle_name': 'nucleotides', 'model_bundle_path': None, 'keep_intermediate_results': False, 'pipeline_control': False, 'func': <function main at 0x2b48b9c3aaf0>}

2023-05-17 at 18:06:01 | INFO | Initial C-alpha prediction with args: {'model_checkpoint': 'chkpt.torch', 'bfactor': 0, 'batch_size': 4, 'box_size': 64, 'stride': 16, 'dont_mask_input': True, 'threshold': 0.05, 'save_real_coordinates': False, 'save_cryo_em_grid': False, 'do_nucleotides': True, 'save_backbone_trace': False, 'save_output_grid': False, 'crop': 6, 'log_dir': '/local/Pipelines/ModelAngelo/model_angelo_weights/hub/checkpoints/model_angelo_v1.0/nucleotides/c_alpha', 'map_path': '/local/Maps/cryosparc_P2_J341_003_volume_map.mrc', 'output_path': './see_alpha_output', 'mask_path': None, 'device': None, 'auto_mask': False}

2023-05-17 at 18:06:01 | INFO | Using model file /local/Pipelines/ModelAngelo/model_angelo_weights/hub/checkpoints/model_angelo_v1.0/nucleotides/c_alpha/model.py

2023-05-17 at 18:06:01 | INFO | Using checkpoint file /local/Pipelines/ModelAngelo/model_angelo_weights/hub/checkpoints/model_angelo_v1.0/nucleotides/c_alpha/chkpt.torch

2023-05-17 at 18:06:06 | INFO | Input structure has shape: (194, 194, 194)

2023-05-17 at 18:06:06 | INFO | Running with these arguments:

2023-05-17 at 18:06:06 | INFO | {'model_checkpoint': 'chkpt.torch', 'bfactor': 0, 'batch_size': 4, 'box_size': 64, 'stride': 16, 'dont_mask_input': True, 'threshold': 0.05, 'save_real_coordinates': False, 'save_cryo_em_grid': False, 'do_nucleotides': True, 'save_backbone_trace': False, 'save_output_grid': False, 'crop': 6, 'log_dir': '/local/Pipelines/ModelAngelo/model_angelo_weights/hub/checkpoints/model_angelo_v1.0/nucleotides/c_alpha', 'map_path': '/local/Maps/cryosparc_P2_J341_003_volume_map.mrc', 'output_path': './see_alpha_output', 'mask_path': None, 'device': None, 'auto_mask': False}

2023-05-17 at 18:10:32 | INFO | Model prediction done, took 265.69 seconds for 729 sliding windows

2023-05-17 at 18:10:32 | INFO | Average time is 364.460 ms

2023-05-17 at 18:10:32 | INFO | Starting Cα grid to points...

2023-05-17 at 18:10:33 | INFO | Have 15582 Cα points before pruning and 1887 after pruning

2023-05-17 at 18:10:34 | INFO | Starting P grid to points...

2023-05-17 at 18:10:34 | INFO | Have 6515 P points before pruning and 303 after pruning

2023-05-17 at 18:10:35 | INFO | Finished inference!

2023-05-17 at 18:10:35 | INFO | GNN model refinement round 1 with args: {'num_rounds': 3, 'crop_length': 200, 'repeat_per_residue': 1, 'esm_model': 'esm1b_t33_650M_UR50S', 'aggressive_pruning': True, 'seq_attention_batch_size': 200, 'fp16': False, 'batch_size': 1, 'voxel_size': 1.0, 'map': '/local/Maps/cryosparc_P2_J341_003_volume_map.mrc', 'protein_fasta': '/local/seq/test.fasta', 'rna_fasta': '/local/seq/test_RNA.fasta', 'dna_fasta': '/local/seq/test_DNA.fasta', 'struct': './see_alpha_output/see_alpha_merged_output.cif', 'output_dir': './gnn_output_round_1', 'model_dir': '/local/Pipelines/ModelAngelo/model_angelo_weights/hub/checkpoints/model_angelo_v1.0/nucleotides/gnn', 'device': None, 'write_hmm_profiles': False, 'refine': False}

2023-05-17 at 18:10:35 | INFO | Loaded module from step: 483863

2023-05-17 at 18:13:06 | ERROR | Error in ModelAngelo

Traceback (most recent call last):

File "/home/pacesa/miniconda3/envs/model_angelo/bin/model_angelo", line 33, in

sys.exit(load_entry_point('model-angelo==1.0.0', 'console_scripts', 'model_angelo')())

│   │    └ <function importlib_load_entry_point at 0x2b4809d32280>

│   └ <built-in function exit>

└ <module 'sys' (built-in)>

File "/home/pacesa/miniconda3/envs/model_angelo/lib/python3.9/site-packages/model_angelo-1.0.0-py3.9.egg/model_angelo/main.py", line 52, in main

args.func(args)

│    │    └ Namespace(volume_path='/local/Maps/cryosparc_P2_J341_003_volume_map.mrc', protein_fas...

│    └ <function main at 0x2b48b9c3aaf0>

└ Namespace(volume_path='/local/Maps/cryosparc_P2_J341_003_volume_map.mrc', protein_fas...

File "/home/pacesa/miniconda3/envs/model_angelo/lib/python3.9/site-packages/model_angelo-1.0.0-py3.9.egg/model_angelo/apps/build.py", line 241, in main

gnn_output = gnn_infer(gnn_infer_args)

             │         └ {'num_rounds': 3, 'crop_length': 200, 'repeat_per_residue': 1, 'esm_model': 'esm1b_t33_650M_UR50S', 'aggressive_pruning': Tru...

             └ <function infer at 0x2b48b8ec4f70>

File "/home/pacesa/miniconda3/envs/model_angelo/lib/python3.9/site-packages/model_angelo-1.0.0-py3.9.egg/model_angelo/gnn/inference.py", line 184, in infer

final_results_to_cif(

└ <function final_results_to_cif at 0x2b48b9c3a8b0>

File "/home/pacesa/miniconda3/envs/model_angelo/lib/python3.9/site-packages/model_angelo-1.0.0-py3.9.egg/model_angelo/gnn/flood_fill.py", line 251, in final_results_to_cif

final_results["aa_logits"][existence_mask][c] for c in pruned_chains

│                          └ array([ True,  True,  True, ...,  True,  True,  True])

└ {'pred_positions': array([[149.68655 , 158.68929 ,  80.635506],

         [152.87912 , 157.20757 ,  82.420456],

         [151.5113...

NameError: name 'pruned_chains' is not defined

``

Hi @martinpacesa ,

Sorry, this is a bug. It's been fixed, could you pull the repo again, update the installation, and try again please?

IMhallelujahxn commented 1 year ago

I got a similar error after first refinement iteration as follows, could you help to figure out the problem?

2023-05-18 at 09:56:47 | INFO | Finished inference! 2023-05-18 at 09:56:47 | INFO | GNN model refinement round 1 with args: {'num_rounds': 3, 'crop_length': 200, 'repeat_per_residue': 1, 'esm_model': 'esm1b_t33_650M_UR50S', 'aggressive_pruning': True, 'seq_atte$ 2023-05-18 at 09:56:47 | INFO | Loaded module from step: 483863 2023-05-18 at 09:57:59 | ERROR | Error in ModelAngelo Traceback (most recent call last):

File "/home/hxn/anaconda3/envs/model_angelo/bin/model_angelo", line 33, in sys.exit(load_entry_point('model-angelo==1.0.0', 'console_scripts', 'model_angelo')()) │ │ └ <function importlib_load_entry_point at 0x7fbcf6a5bd90> │ └ └ <module 'sys' (built-in)> File "/home/hxn/anaconda3/envs/model_angelo/lib/python3.10/site-packages/model_angelo-1.0.0-py3.10.egg/model_angelo/main.py", line 52, in main args.func(args) │ │ └ Namespace(volume_path='J345_map.mrc', protein_fasta='BC-preF-3.fasta', rna_fasta=None, dna_fasta=None, output_dir='angelo_out... │ └ <function main at 0x7fbbb14d1ea0> └ Namespace(volume_path='J345_map.mrc', protein_fasta='BC-preF-3.fasta', rna_fasta=None, dna_fasta=None, output_dir='angelo_out...

File "/home/hxn/anaconda3/envs/model_angelo/lib/python3.10/site-packages/model_angelo-1.0.0-py3.10.egg/model_angelo/apps/build.py", line 241, in main gnn_output = gnn_infer(gnn_infer_args) │ └ {'num_rounds': 3, 'crop_length': 200, 'repeat_per_residue': 1, 'esm_model': 'esm1b_t33_650M_UR50S', 'aggressive_pruning': Tru... └ <function infer at 0x7fbbb2054310> File "/home/hxn/anaconda3/envs/model_angelo/lib/python3.10/site-packages/model_angelo-1.0.0-py3.10.egg/model_angelo/gnn/inference.py", line 184, in infer final_results_to_cif( └ <function final_results_to_cif at 0x7fbbb14d16c0> File "/home/hxn/anaconda3/envs/model_angelo/lib/python3.10/site-packages/model_angelo-1.0.0-py3.10.egg/model_angelo/gnn/flood_fill.py", line 291, in final_results_to_cif fix_chains_output = fix_chains_pipeline( └ <function fix_chains_pipeline at 0x7fbbb14d1000> File "/home/hxn/anaconda3/envs/model_angelo/lib/python3.10/site-packages/model_angelo-1.0.0-py3.10.egg/model_angelo/utils/hmm_sequence_align.py", line 521, in fix_chains_pipeline best_match_output = best_match_to_sequences( └ <function best_match_to_sequences at 0x7fbbb14d04c0> File "/home/hxn/anaconda3/envs/model_angelo/lib/python3.10/site-packages/model_angelo-1.0.0-py3.10.egg/model_angelo/utils/hmm_sequence_align.py", line 211, in best_match_to_sequences hmm_alignment = get_hmm_alignment( └ <function get_hmm_alignment at 0x7fbbb14d0430> File "/home/hxn/anaconda3/envs/model_angelo/lib/python3.10/site-packages/model_angelo-1.0.0-py3.10.egg/model_angelo/utils/hmm_sequence_align.py", line 50, in get_hmm_alignment msas = pyhmmer.hmmer.hmmalign( │ │ └ <function hmmalign at 0x7fbbb20af880> │ └ <module 'pyhmmer.hmmer' from '/home/hxn/anaconda3/envs/model_angelo/lib/python3.10/site-packages/pyhmmer/hmmer.py'> └ <module 'pyhmmer' from '/home/hxn/anaconda3/envs/model_angelo/lib/python3.10/site-packages/pyhmmer/init.py'> File "/home/hxn/anaconda3/envs/model_angelo/lib/python3.10/site-packages/pyhmmer/hmmer.py", line 1369, in hmmalign traces = aligner.compute_traces(hmm, sequences) │ │ │ └ DigitalSequenceBlock(pyhmmer.easel.Alphabet.amino(), [<pyhmmer.easel.DigitalSequence object at 0x7fbba0729b80>]) │ │ └ <pyhmmer.plan7.HMM object at 0x7fbbb2a3cc80> │ └ <method 'compute_traces' of 'pyhmmer.plan7.TraceAligner' objects> └ TraceAligner() File "pyhmmer/plan7.pyx", line 8440, in pyhmmer.plan7.TraceAligner.compute_traces cpdef Traces compute_traces(self, HMM hmm, DigitalSequenceBlock sequences): │ └ <class 'pyhmmer.plan7.HMM'> └ <class 'pyhmmer.plan7.Traces'> File "pyhmmer/plan7.pyx", line 8480, in pyhmmer.plan7.TraceAligner.compute_traces raise ValueError(f"Invalid HMM: {err_msg}")

ValueError: Invalid HMM: TMD should be 0 for last node

martinpacesa commented 1 year ago

This is now resolved for me, thank yoU!

jamaliki commented 1 year ago

Hi @IMhallelujahxn ,

I am not sure what the problem is. It could be either 1) Something strange with your FASTA file, or 2) You are using an old version of pyHMMER

To find out the problem, could you please: 1) Send me (or upload here) your FASTA file and 2) With the model_angelo conda environment activated, run the following command and report back the results:

python -c 'import pyhmmer; print(pyhmmer.__version__)'
IMhallelujahxn commented 1 year ago

Hi @jamaliki, I used a fasta file downloaded from PDB, so probably it's not the problem. The command returned a version number of 0.8.0

jamaliki commented 1 year ago

@IMhallelujahxn could you please upload the FASTA anyway? The PDB has a myriad of different conventions. For example, if the FASTA file contains "X" amino-acids, it won't work. If it is too much trouble to upload the FASTA file, then please send me the link you used to download it. I need to be able to reproduce your problem so that I can help :)

IMhallelujahxn commented 1 year ago

@jamaliki fasta file is sent through email.

jamaliki commented 1 year ago

Thank you @IMhallelujahxn !

The issue with the error ValueError: Invalid HMM: TMD should be 0 for last node is related to pyHMMER version 0.8.0

@rfronzes and @IMhallelujahxn to fix this problem, please revert to pyHMMER 0.7.1 like so:

pip install pyhmmer==0.7.1 -U
jamaliki commented 1 year ago

This is fixed now as of v1.0.1