running -c with custom config.json. Only want to change threshold but more arguments need defining

fergusonscripps commented 8 months ago

I am trying to run the following command with the standard values but just changing the threshold of the run. It has difficulties at many stages in the program mostly I think related to needing further arguments in the config.json. Would be great to get any advice on what extra args need to be defined and where if you want to run a custom .json

model_angelo build --volume-path X.mrc --fasta-path X.fasta --c config.json --o model

I keep getting errors about standard args not being defined so I added them:

"box_size":64 "save_ca_grid": true

were needed to get past the first iteration and seem to work for the calphas/inference.py.

I then needed to add:

"batch_size":200, "fp16":false, "voxel_size":0.725

for getting the gnn args

but now I am stuck with the following error and not sure how to fix it. Would appreciate any help. Thanks and congrats on publishing your paper!!!

File "/opt/applications/modelangelo/1.0.1/gnu/lib/python3.8/site-packages/model_angelo/apps/build.py", line 207, in main ca_cif_path = c_alpha_infer(ca_infer_args) │ └ {'box_size': 64, 'model_checkpoint': 'chkpt.torch', 'bfactor': 0, 'batch_size': 4, 'stride': 16, 'dont_mask_input': True, 'th... └ <function infer at 0x7f5d16840d30>

File "/opt/applications/modelangelo/1.0.1/gnu/lib/python3.8/site-packages/model_angelo/c_alpha/inference.py", line 337, in infer os.path.join(args.output_path, "real_points.cif"), voxel_size * cas, │ │ │ │ │ └ 1.5047170306151767 │ │ │ │ └ 'model/see_alpha_output' │ │ │ └ {'box_size': 64, 'model_checkpoint': 'chkpt.torch', 'bfactor': 0, 'batch_size': 4, 'stride': 16, 'dont_mask_input': True, 'th... │ │ └ <function join at 0x7f5e1b8cbb80> │ └ <module 'posixpath' from '/opt/applications/python/3.8.3/gnu/lib/python3.8/posixpath.py'> └ <module 'os' from '/opt/applications/python/3.8.3/gnu/lib/python3.8/os.py'>

UnboundLocalError: local variable 'cas' referenced before assignment (

{ "standardize_mrc_args": { "target_voxel_size": 0.725, "crop_z": 0, "bfactor_to_apply": 0, "auto_mask": false }, "ca_infer_args": { "box_size":64, "model_checkpoint": "chkpt.torch", "bfactor": 0, "batch_size": 4, "stride": 16, "dont_mask_input": true, "threshold": 0.1, "save_real_coordinates": true, "save_cryo_em_grid": true, "do_nucleotides": false, "save_backbone_trace": true, "save_ca_grid": true, "save_output_grid":true, "crop": 6 }, "gnn_infer_args": { "batch_size":200, "fp16":false, "voxel_size":0.725, "num_rounds": 3, "crop_length": 200, "repeat_per_residue": 3, "esm_model": "esm1b_t33_650M_UR50S", "aggressive_pruning": false, "seq_attention_batch_size": 200 } }

fergusonscripps commented 8 months ago

Also I forgot to say. I usually run this on a shared computational environment. I reran the -c on my own system and it has similar errors to the shared environment If I just change the default config.json there is no problem but this can't be done on the shared environment. Thanks

rbs-sci commented 8 months ago

I, too, would appreciate a comprehensive treatise on the options available in a custom config.json - going through with trial-and-error and restarting every time a new error pops up is not terribly efficient!

The defaults work for RELION-refined maps almost always (including some real edge-cases with thousands of predicted chains!) but the results I get from a fairly simple CryoSPARC-refined map are absolute nonsense.

ccgauvin94 commented 1 month ago

While it would be great if a functional config.json file could be provided, the source of your issue lies somewhere in this region:

      "save_real_coordinates": false,
      "save_cryo_em_grid": false,
      "do_nucleotides": true,
      "save_backbone_trace": false,
      "save_ca_grid": false,

Specifically, and I haven't debugged any further, all the "save" options need to be set to false to run, on my system. Otherwise, I get the error described.

ccgauvin94 commented 1 month ago

As for which args need to be defined for GNN, you can see this here in build.py:

            gnn_infer_args = Args(config["gnn_infer_args"])
            gnn_infer_args.map = parsed_args.volume_path
            gnn_infer_args.protein_fasta = new_protein_fasta_path
            gnn_infer_args.rna_fasta = parsed_args.rna_fasta
            gnn_infer_args.dna_fasta = parsed_args.dna_fasta
            gnn_infer_args.struct = current_ca_cif_path
            gnn_infer_args.output_dir = current_output_dir
            gnn_infer_args.model_dir = gnn_model_logdir
            gnn_infer_args.device = parsed_args.device
            gnn_infer_args.write_hmm_profiles = False
            gnn_infer_args.refine = False

3dem / model-angelo

running -c with custom config.json. Only want to change threshold but more arguments need defining #100