deepmodeling / Uni-Mol

Official Repository for the Uni-Mol Series Methods
MIT License
688 stars 121 forks source link

problems running demo files #236

Closed CLG68 closed 3 months ago

CLG68 commented 3 months ago

Hi, The demo.sh and variations have this argument: --model-dir checkpoint_best.pt

This file does not exist. If I replace it by: --model-dir ../weights/unimol_docking_v2_240517.pt I get errors that sem to be linked to the fact that it tries to open files with a name made of each letter of: ligand_predict

(unimol) christian@christian-linux02:/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/interface$ ./demo1.sh Namespace(model_dir='../weights/unimol_docking_v2_240517.pt', input_protein='../example_data/protein.pdb', input_ligand='../example_data/ligand.sdf', input_batch_file='input_batch.csv', input_docking_grid='../example_data/docking_grid.json', output_ligand_name='ligand_predict', output_ligand_dir='predict_sdf', mode='single', batch_size=4, nthreads=8, conf_size=10, cluster=True, use_current_ligand_conf=False, steric_clash_fix=True) Start preprocessing data... Number of ligands: 1 1it [00:01, 1.37s/it] Total num: 1, Success: 1, Failed: 0 Done! 2024-06-24 00:13:10 | INFO | unimol.inference | loading model(s) from ../weights/unimol_docking_v2_240517.pt 2024-06-24 00:13:10 | INFO | unimol.tasks.docking_pose_v2 | ligand dictionary: 30 types 2024-06-24 00:13:10 | INFO | unimol.tasks.docking_pose_v2 | pocket dictionary: 9 types 2024-06-24 00:13:11 | INFO | unimol.inference | Namespace(no_progress_bar=False, log_interval=50, log_format='simple', tensorboard_logdir='', wandb_project='', wandb_name='', seed=1, cpu=False, fp16=True, bf16=False, bf16_sr=False, allreduce_fp32_grad=False, fp16_no_flatten_grads=False, fp16_init_scale=4, fp16_scale_window=256, fp16_scale_tolerance=0.0, min_loss_scale=0.0001, threshold_loss_scale=None, user_dir='/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol', empty_cache_freq=0, all_gather_list_size=16384, suppress_crashes=False, profile=False, ema_decay=-1.0, validate_with_ema=False, loss='docking_pose_v2', optimizer='adam', lr_scheduler='fixed', task='docking_pose_v2', num_workers=8, skip_invalid_size_inputs_valid_test=False, batch_size=4, required_batch_size_multiple=1, data_buffer_size=10, train_subset='train', valid_subset='ligand_predict', validate_interval=1, validate_interval_updates=0, validate_after_updates=0, fixed_validation_seed=None, disable_validation=False, batch_size_valid=4, max_valid_steps=None, curriculum=0, distributed_world_size=1, distributed_rank=0, distributed_backend='nccl', distributed_init_method=None, distributed_port=-1, device_id=0, distributed_no_spawn=False, ddp_backend='c10d', bucket_cap_mb=25, fix_batches_to_gpus=False, find_unused_parameters=False, fast_stat_sync=False, broadcast_buffers=False, nprocs_per_node=1, path='../weights/unimol_docking_v2_240517.pt', quiet=False, model_overrides='{}', results_path='/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/interface/predict_sdf', arch='docking_pose_v2', recycling=4, data='/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/interface/predict_sdf', finetune_mol_model=None, finetune_pocket_model=None, conf_size=10, dist_threshold=8.0, max_pocket_atoms=256, adam_betas='(0.9, 0.999)', adam_eps=1e-08, weight_decay=0.0, force_anneal=None, lr_shrink=0.1, warmup_updates=0, no_seed_provided=False, mol=Namespace(encoder_layers=15, encoder_embed_dim=512, encoder_ffn_embed_dim=2048, encoder_attention_heads=64, dropout=0.1, emb_dropout=0.1, attention_dropout=0.1, activation_dropout=0.0, pooler_dropout=0.0, max_seq_len=512, activation_fn='gelu', pooler_activation_fn='tanh', post_ln=False, masked_token_loss=-1.0, masked_coord_loss=-1.0, masked_dist_loss=-1.0, x_norm_loss=-1.0, delta_pair_repr_norm_loss=-1.0), pocket=Namespace(encoder_layers=15, encoder_embed_dim=512, encoder_ffn_embed_dim=2048, encoder_attention_heads=64, dropout=0.1, emb_dropout=0.1, attention_dropout=0.1, activation_dropout=0.0, pooler_dropout=0.0, max_seq_len=512, activation_fn='gelu', pooler_activation_fn='tanh', post_ln=False, masked_token_loss=-1.0, masked_coord_loss=-1.0, masked_dist_loss=-1.0, x_norm_loss=-1.0, delta_pair_repr_norm_loss=-1.0), encoder_layers=15, encoder_embed_dim=512, encoder_ffn_embed_dim=2048, encoder_attention_heads=64, dropout=0.1, emb_dropout=0.1, attention_dropout=0.1, activation_dropout=0.0, pooler_dropout=0.0, max_seq_len=512, activation_fn='gelu', pooler_activation_fn='tanh', post_ln=False, masked_token_loss=-1.0, masked_coord_loss=-1.0, masked_dist_loss=-1.0, x_norm_loss=-1.0, delta_pair_repr_norm_loss=-1.0, distributed_num_procs=1) 2024-06-24 00:13:11 | INFO | unicore.tasks.unicore_task | get EpochBatchIterator for epoch 1 2024-06-24 00:13:14 | INFO | unimol.inference | Done inference! Start converting model predictions into sdf files... 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 2032.12it/s] Done! 0%| | 0/1 [00:00<?, ?it/s]Traceback (most recent call last): File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 603, in single_refine( File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 513, in single_refine in_lig = Chem.MolFromMolFile(input_ligand_path, sanitize=False) OSError: Bad input file d Traceback (most recent call last): File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 603, in single_refine( File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 513, in single_refine in_lig = Chem.MolFromMolFile(input_ligand_path, sanitize=False) OSError: Bad input file t Traceback (most recent call last): File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 603, in single_refine( File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 513, in single_refine in_lig = Chem.MolFromMolFile(input_ligand_path, sanitize=False) OSError: Bad input file e Traceback (most recent call last): File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 603, in single_refine( File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 513, in single_refine in_lig = Chem.MolFromMolFile(input_ligand_path, sanitize=False) OSError: Bad input file p Traceback (most recent call last): File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 603, in single_refine( File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 513, in single_refine in_lig = Chem.MolFromMolFile(input_ligand_path, sanitize=False) OSError: Bad input file r Traceback (most recent call last): File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 603, in single_refine( File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 513, in single_refine in_lig = Chem.MolFromMolFile(input_ligand_path, sanitize=False) OSError: Bad input file c Traceback (most recent call last): File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 603, in single_refine( File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 513, in single_refine in_lig = Chem.MolFromMolFile(input_ligand_path, sanitize=False) OSError: Bad input file i Traceback (most recent call last): File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 603, in single_refine( File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 513, in single_refine in_lig = Chem.MolFromMolFile(input_ligandpath, sanitize=False) OSError: Bad input file 5it [00:01, 3.34it/s] Traceback (most recent call last): File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 603, in single_refine( File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 513, in single_refine in_lig = Chem.MolFromMolFile(input_ligand_path, sanitize=False) OSError: Bad input file s Traceback (most recent call last): File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 603, in single_refine( File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 513, in single_refine in_lig = Chem.MolFromMolFile(input_ligand_path, sanitize=False) OSError: Bad input file d [00:13:19] Counts line too short: '' on line4 Traceback (most recent call last): File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 603, in single_refine( File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 514, in single_refine Chem.SanitizeMol(in_lig, sanitizeOps=Chem.SanitizeFlags.SANITIZE_ALL, catchErrors=True) Boost.Python.ArgumentError: Python argument types in rdkit.Chem.rdmolops.SanitizeMol(NoneType) did not match C++ signature: SanitizeMol(RDKit::ROMol {lvalue} mol, unsigned long sanitizeOps=rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_ALL, bool catchErrors=False) Traceback (most recent call last): File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 603, in single_refine( File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 513, in single_refine in_lig = Chem.MolFromMolFile(input_ligand_path, sanitize=False) OSError: Bad input file f Traceback (most recent call last): File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 603, in single_refine( File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 513, in single_refine in_lig = Chem.MolFromMolFile(input_ligand_path, sanitize=False) OSError: Bad input file l Traceback (most recent call last): File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 603, in single_refine( File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 513, in single_refine in_lig = Chem.MolFromMolFile(input_ligand_path, sanitize=False) OSError: Bad input file a Traceback (most recent call last): File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 603, in single_refine( File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 513, in single_refine in_lig = Chem.MolFromMolFile(input_ligand_path, sanitize=False) OSError: Bad input file i 9it [00:03, 2.97it/s]Traceback (most recent call last): File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 603, in single_refine( File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 513, in single_refine in_lig = Chem.MolFromMolFile(input_ligand_path, sanitize=False) OSError: Bad input file g 15it [00:03, 5.70it/s]Traceback (most recent call last): File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 603, in single_refine( File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 513, in single_refine in_lig = Chem.MolFromMolFile(input_ligandpath, sanitize=False) OSError: Bad input file Traceback (most recent call last): File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 603, in single_refine( File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 513, in single_refine in_lig = Chem.MolFromMolFile(input_ligand_path, sanitize=False) OSError: Bad input file p Traceback (most recent call last): File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 603, in single_refine( File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 513, in single_refine in_lig = Chem.MolFromMolFile(input_ligand_path, sanitize=False) OSError: Bad input file n Traceback (most recent call last): File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 603, in single_refine( File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 513, in single_refine in_lig = Chem.MolFromMolFile(input_ligand_path, sanitize=False) OSError: Bad input file d Traceback (most recent call last): File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 603, in single_refine( File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 513, in single_refine in_lig = Chem.MolFromMolFile(input_ligand_path, sanitize=False) OSError: Bad input file r Traceback (most recent call last): File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 603, in single_refine( File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 513, in single_refine in_lig = Chem.MolFromMolFile(input_ligand_path, sanitize=False) OSError: Bad input file d Traceback (most recent call last): File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 603, in single_refine( File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 513, in single_refine in_lig = Chem.MolFromMolFile(input_ligand_path, sanitize=False) OSError: Bad input file i Traceback (most recent call last): File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 603, in single_refine( File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 513, in single_refine in_lig = Chem.MolFromMolFile(input_ligand_path, sanitize=False) OSError: Bad input file e 22it [00:05, 5.32it/s]Traceback (most recent call last): File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 603, in single_refine( File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 513, in single_refine in_lig = Chem.MolFromMolFile(input_ligand_path, sanitize=False) OSError: Bad input file c Traceback (most recent call last): File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 603, in single_refine( File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/unimol/scripts/6tsr.py", line 513, in single_refine in_lig = Chem.MolFromMolFile(input_ligand_path, sanitize=False) OSError: Bad input file t 26it [00:06, 4.02it/s] output ligand path: predict_sdf/ligand_predict.sdf total time: 15.468194723129272 sec. All processes done!

ZhouGengmo commented 3 months ago

This bug has been fixed through issue https://github.com/dptech-corp/Uni-Mol/issues/230. Please pull the latest code.

CLG68 commented 3 months ago

Now I can run this w/o problem:

python demo.py --mode single --conf-size 10 --cluster \ --input-protein ../example_data/protein.pdb \ --input-ligand ../example_data/ligand.sdf \ --input-docking-grid ../example_data/docking_grid.json \ --output-ligand-name ligand_predict \ --output-ligand-dir predict_sdf \ --steric-clash-fix \ --model-dir ../weights/unimol_docking_v2_240517.pt

However, I tried to store my files elsewhere and I started having problems... I cannot run with full path, I need to stay with relative path. I was not able to run the script with my files so I moved ../example_data/ligand.sdf from the demo.

This works: --input-ligand ../Targets/ligand.sdf \

but this does not: --input-ligand /media/christian/VS1/VS/VS/VS_Uni-Mol/unimol_docking_v2/Targets/ligand.sdf \

it gave me:

(unimol) christian@christian-linux02:/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/interface$ ./demo3.sh Namespace(model_dir='../weights/unimol_docking_v2_240517.pt', input_protein='../example_data/protein.pdb', input_ligand='/media/christian/VS1/VS/VS/VS_Uni-Mol/unimol_docking_v2/Targets/ligand.sdf', input_batch_file='input_batch.csv', input_docking_grid='../example_data/docking_grid.json', output_ligand_name='ligand_predict', output_ligand_dir='predict_sdf', mode='single', batch_size=4, nthreads=8, conf_size=10, cluster=True, use_current_ligand_conf=False, steric_clash_fix=True) Traceback (most recent call last): File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/interface/demo.py", line 189, in main_cli() File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/interface/demo.py", line 185, in main_cli main(args) File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/interface/demo.py", line 24, in main output_ligand) = clf.predict_sdf( File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/interface/predictor/unimol_predictor.py", line 76, in predict_sdf output_pkl, output_lmdb = self.predict(input_protein, File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/interface/predictor/unimol_predictor.py", line 38, in predict lmdb_name = self.preprocess(input_protein, input_ligand, input_docking_grid, output_ligand_name, output_ligand_dir) File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/interface/predictor/unimol_predictor.py", line 27, in preprocess processed_data = preprocessor.preprocess(input_protein, input_ligand, input_docking_grid, output_ligand_name, output_ligand_dir) File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/interface/predictor/processor.py", line 46, in preprocess supp = Chem.SDMolSupplier(input_ligand) OSError: File error: Bad input file /media/christian/VS1/VS/VS/VS_Uni-Mol/unimol_docking_v2/Targets/ligand.sdf

CLG68 commented 3 months ago

same thing with the protein file... The full path

    --input-protein /media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/example_data \

gives a different error such as:

TypeError: cannot unpack non-iterable NoneType object

and generates this file: failedpocket.txt containing: / m e d i a / c h r i s t i a n / V S 1 / V S / V S U n i - M o l / u n i m o l d o c k i n g v 2 / e x a m p l e _ d a t a

ZhouGengmo commented 3 months ago

This works: --input-ligand ../Targets/ligand.sdf \ but this does not: --input-ligand /media/christian/VS1/VS/VS/VS_Uni-Mol/unimol_docking_v2/Targets/ligand.sdf \

I used the absolute path of the sdf file from the example data as input-ligand, and failed to reproduce your error.

File "/media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/interface/predictor/processor.py", line 46, in preprocess supp = Chem.SDMolSupplier(input_ligand) OSError: File error: Bad input file /media/christian/VS1/VS/VS/VS_Uni-Mol/unimol_docking_v2/Targets/ligand.sdf

Based on the error message, the sdf file path is complete, and Chem.SDMolSupplier can accept absolute paths, but the file cannot be read from this path. Are you sure this path is correct? Or could you provide the sdf file you are using?

same thing with the protein file... The full path

    --input-protein /media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/example_data \

--input-protein needs to be a pdb file, not a directory

CLG68 commented 3 months ago

If I run this it works

python demo.py --mode single --conf-size 10 --cluster \ --input-protein /media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/example_data/protein.pdb \ --input-ligand /media/christian/VS1/VS/VS_Uni-Mol/unimol_docking_v2/example_data/ligand.sdf \ --input-docking-grid ../example_data/docking_grid.json \ --output-ligand-name ligand_predict \ --output-ligand-dir predict_sdf \ --steric-clash-fix \ --model-dir ../weights/unimol_docking_v2_240517.pt

So it seems to be ok with a full path. For the protein.pdb, it should have pointed directly to the file and not to the directory. But I'm not sure what happens when I just changed the ligand from: --input-ligand ../Targets/ligand.sdf \ to --input-ligand /media/christian/VS1/VS/VS/VS_Uni-Mol/unimol_docking_v2/Targets/ligand.sdf \

As long as it works with the demo files, it should work with mine. I'll figure it out, it is probably just a detail. I ran 134 ligands vs 1 receptor using relative path and it worked well, at around 3.5sec/pose. The poses are very similar to what I get with DiffBindFR which is 20-40x slower. Thank you very much for your help and this very useful tool.