Closed MachineGUN001 closed 5 months ago
You could replace self.keys()
with self.keys
. Thanks!
The generated molecule will be saved as a sdf file, and you could find the save path in the code.
@Layne-Huang thanks for your kind explanation.
instead, I upgraded the version of Pytorch Geometric to = 2.4.0. this problem was solved as well. Could this be the consequence of Pytorch Geometric version differences?
Yes, there are many incompatibilities of pyg after 2.0 versions, thus I also provide the codes of egnn for pgy before and after 2.3.0.
many thanks again, and close it!
@Layne-Huang
sorry for another question,
According to the SAVING molecules' code, each new molecule generated is saved as a separate SDF file. In the script, each time a new valid molecule is generated, the save_sdf function is called and a new filename is created for that molecule.
But I have been running for about 20 hours without seeing a single sdf file generated, is there a problem with the pdb protein file or ligand sdf file I am using? Can you provide an example file including protein pocket file (.pdb file) and ligand .sdf file?
many thanks,
I have tested the code again. There should be no issue. Please check these codes and try to print your saving path:
if save_sdf_flag:
print('save')
gen_file_name = '{}_{}.sdf'.format(pdb_name, str(num_samples))
print(gen_file_name)
save_sdf(gmol, sdf_dir, gen_file_name)
thank you for checking the codes,
here are some outputs after interrupting the script:
^C
'module' is not recognized as an internal or external command,
operable program or batch file.
[2024-04-08 15:34:44,452::test::INFO] Namespace(pdb_path='protein/pro_A_YG1.pdb', sdf_path=None, num_atom=50, build_method='reconstruct', config=None, cuda=True, ckpt='ckpt/500.pt', save_traj=False, num_samples=50, batch_size=3, resume=None, tag='', clip=1000.0, n_steps=1000, global_start_sigma=inf, w_global_pos=1.0, w_local_pos=1.0, w_global_node=1.0, w_local_node=1.0, sampling_type='generalized', eta=1.0)
[2024-04-08 15:34:44,453::test::INFO] {'model': {'type': 'diffusion', 'network': 'MDM_full_pocket_coor_shared', 'hidden_dim': 128, 'protein_hidden_dim': 128, 'num_convs': 3, 'num_convs_local': 3, 'protein_num_convs': 2, 'cutoff': 3.0, 'g_cutoff': 6.0, 'encoder_cutoff': 6.0, 'time_emb': True, 'atom_num_emb': False, 'mlp_act': 'relu', 'beta_schedule': 'sigmoid', 'beta_start': 1e-07, 'beta_end': 0.002, 'num_diffusion_timesteps': 1000, 'edge_order': 3, 'edge_encoder': 'mlp', 'smooth_conv': False, 'num_layer': 9, 'feats_dim': 5, 'soft_edge': True, 'norm_coors': True, 'm_dim': 128, 'context': 'None', 'vae_context': False, 'num_atom': 10, 'protein_feature_dim': 31}, 'train': {'seed': 2021, 'batch_size': 16, 'val_freq': 250, 'max_iters': 500, 'max_grad_norm': 10.0, 'num_workers': 4, 'anneal_power': 2.0, 'optimizer': {'type': 'adam', 'lr': 0.001, 'weight_decay': 0.0, 'beta1': 0.95, 'beta2': 0.999}, 'scheduler': {'type': 'plateau', 'factor': 0.6, 'patience': 10, 'min_lr': 1e-06}, 'transform': {'mask': {'type': 'mixed', 'min_ratio': 0.0, 'max_ratio': 1.2, 'min_num_masked': 1, 'min_num_unmasked': 0, 'p_random': 0.5, 'p_bfs': 0.25, 'p_invbfs': 0.25}, 'contrastive': {'num_real': 50, 'num_fake': 50, 'pos_real_std': 0.05, 'pos_fake_std': 2.0}}}, 'dataset': {'name': 'crossdock', 'type': 'pl', 'path': './data/crossdocked_pocket10', 'split': './data/split_by_name.pt'}}
[2024-04-08 15:34:44,453::test::INFO] Loading crossdock data...
[2024-04-08 15:34:44,455::test::INFO] Loading data...
[2024-04-08 15:34:45,288::test::INFO] Building model...
[2024-04-08 15:34:45,289::test::INFO] MDM_full_pocket_coor_shared
0%| | 0/33 [00:00<?, ?it/s]
45%|████▌ | 15/33 [00:00<00:00, 145.18it/s]
100%|██████████| 33/33 [00:00<00:00, 164.19it/s]
100%|██████████| 33/33 [00:00<00:00, 161.31it/s]
0%| | 0/33 [00:00<?, ?it/s]
sample: 0it [00:00, ?it/s][A
sample: 1it [00:16, 16.47s/it][A
sample: 2it [00:30, 15.21s/it][A
sample: 3it [00:45, 14.88s/it][A
sample: 4it [00:59, 14.66s/it][A
sample: 5it [01:14, 14.57s/it][A
sample: 6it [01:28, 14.52s/it][A
sample: 7it [01:42, 14.46s/it][A
sample: 8it [01:57, 14.41s/it][A
.....
sample: 279it [1:06:58, 14.44s/it][A
sample: 280it [1:07:13, 14.44s/it][A
sample: 281it [1:07:27, 14.44s/it][A
sample: 282it [1:07:42, 14.46s/it][A
sample: 283it [1:07:56, 14.47s/it][A
Entropy of n_nodes: H[N] -1.3862943649291992
Entropy of n_nodes: H[N] -3.543935775756836
{'type': 'diffusion', 'network': 'MDM_full_pocket_coor_shared', 'hidden_dim': 128, 'protein_hidden_dim': 128, 'num_convs': 3, 'num_convs_local': 3, 'protein_num_convs': 2, 'cutoff': 3.0, 'g_cutoff': 6.0, 'encoder_cutoff': 6.0, 'time_emb': True, 'atom_num_emb': False, 'mlp_act': 'relu', 'beta_schedule': 'sigmoid', 'beta_start': 1e-07, 'beta_end': 0.002, 'num_diffusion_timesteps': 1000, 'edge_order': 3, 'edge_encoder': 'mlp', 'smooth_conv': False, 'num_layer': 9, 'feats_dim': 5, 'soft_edge': True, 'norm_coors': True, 'm_dim': 128, 'context': 'None', 'vae_context': False, 'num_atom': 10, 'protein_feature_dim': 31}
sdf idr: protein\generate_ref
Entropy of n_nodes: H[N] -3.543935775756836
1
Invalid,continue
Invalid,continue
Invalid,continue
1
Invalid,continue
Invalid,continue
Invalid,continue
1
Invalid,continue
Invalid,continue
Invalid,continue
1
Invalid,continue
Invalid,continue
Invalid,continue
1
Invalid,continue
Invalid,continue
Invalid,continue
1
that looks no singe new molecule was generated as well as the related SDF file after running 22 hrs.
I'm not sure if the pdb file is suitable for sample_for_pdb.py. could you please provided example .pdb (pocket) file and ligand.sdf? many many thanks,
Best,
Please use this as an example: https://drive.google.com/file/d/12IQ2Pqah7Kw5yJgUfK-4ojFXZTv2_CB4/view?usp=drive_link.
I tried to use the split_pocket_ligand.py script to split protein 7l11.pdb. two files including 7l11cut20_ligand.pdb and 7l11cut20_pocket.pdb were generated.
then run the below command line,
!python -u sample_for_pdb.py \
--ckpt ckpt/500.pt \
--pdb_path data/7l11cut20/7l11cut20_pocket.pdb \
--num_atom 20 \
--num_samples 10 \
--batch_size 5\
--sampling_type generalized
this script could provide a pocket file with 20A cutoff.
if I used this pdb file and above commandline, does that work well?
thanks a lot for your help. and I'll check it further.
Best,
It should work but 20A is still a large pocket. You could try smaller like 10A or 6A.
got it! the smaller size of pocket could spend less time for running.
@Layne-Huang
sorry to bother you for the same problem.
I implemented the command line with the pdb file provided by you.
!python -u sample_for_pdb.py \
--ckpt ckpt/500.pt \
--pdb_path data/8h6tcut6_pocket.pdb \
--num_atom 50 \
--num_samples 50 \
--batch_size 8 \
--sampling_type generalized
after running the script, however, there is no SDF files with new generated molecules.
Entropy of n_nodes: H[N] -1.3862943649291992
Entropy of n_nodes: H[N] -3.543935775756836
{'type': 'diffusion', 'network': 'MDM_full_pocket_coor_shared', 'hidden_dim': 128, 'protein_hidden_dim': 128, 'num_convs': 3, 'num_convs_local': 3, 'protein_num_convs': 2, 'cutoff': 3.0, 'g_cutoff': 6.0, 'encoder_cutoff': 6.0, 'time_emb': True, 'atom_num_emb': False, 'mlp_act': 'relu', 'beta_schedule': 'sigmoid', 'beta_start': 1e-07, 'beta_end': 0.002, 'num_diffusion_timesteps': 1000, 'edge_order': 3, 'edge_encoder': 'mlp', 'smooth_conv': False, 'num_layer': 9, 'feats_dim': 5, 'soft_edge': True, 'norm_coors': True, 'm_dim': 128, 'context': 'None', 'vae_context': False, 'num_atom': 10, 'protein_feature_dim': 31}
sdf idr: data\generate_ref
Entropy of n_nodes: H[N] -3.543935775756836
****
1
Invalid,continue
Invalid,continue
Invalid,continue
Invalid,continue
Invalid,continue
Invalid,continue
Invalid,continue
Invalid,continue
'module' is not recognized as an internal or external command,
operable program or batch file.
[2024-04-09 13:54:35,086::test::INFO] Namespace(batch_size=8, build_method='reconstruct', ckpt='ckpt/500.pt', clip=1000.0, config=None, cuda=True, eta=1.0, global_start_sigma=inf, n_steps=1000, num_atom=50, num_samples=50, pdb_path='data/8h6tcut6_pocket.pdb', resume=None, sampling_type='generalized', save_traj=False, sdf_path=None, tag='', w_global_node=1.0, w_global_pos=1.0, w_local_node=1.0, w_local_pos=1.0)
[2024-04-09 13:54:35,086::test::INFO] {'model': {'type': 'diffusion', 'network': 'MDM_full_pocket_coor_shared', 'hidden_dim': 128, 'protein_hidden_dim': 128, 'num_convs': 3, 'num_convs_local': 3, 'protein_num_convs': 2, 'cutoff': 3.0, 'g_cutoff': 6.0, 'encoder_cutoff': 6.0, 'time_emb': True, 'atom_num_emb': False, 'mlp_act': 'relu', 'beta_schedule': 'sigmoid', 'beta_start': 1e-07, 'beta_end': 0.002, 'num_diffusion_timesteps': 1000, 'edge_order': 3, 'edge_encoder': 'mlp', 'smooth_conv': False, 'num_layer': 9, 'feats_dim': 5, 'soft_edge': True, 'norm_coors': True, 'm_dim': 128, 'context': 'None', 'vae_context': False, 'num_atom': 10, 'protein_feature_dim': 31}, 'train': {'seed': 2021, 'batch_size': 16, 'val_freq': 250, 'max_iters': 500, 'max_grad_norm': 10.0, 'num_workers': 4, 'anneal_power': 2.0, 'optimizer': {'type': 'adam', 'lr': 0.001, 'weight_decay': 0.0, 'beta1': 0.95, 'beta2': 0.999}, 'scheduler': {'type': 'plateau', 'factor': 0.6, 'patience': 10, 'min_lr': 1e-06}, 'transform': {'mask': {'type': 'mixed', 'min_ratio': 0.0, 'max_ratio': 1.2, 'min_num_masked': 1, 'min_num_unmasked': 0, 'p_random': 0.5, 'p_bfs': 0.25, 'p_invbfs': 0.25}, 'contrastive': {'num_real': 50, 'num_fake': 50, 'pos_real_std': 0.05, 'pos_fake_std': 2.0}}}, 'dataset': {'name': 'crossdock', 'type': 'pl', 'path': './data/crossdocked_pocket10', 'split': './data/split_by_name.pt'}}
[2024-04-09 13:54:35,087::test::INFO] Loading crossdock data...
[2024-04-09 13:54:35,088::test::INFO] Loading data...
[2024-04-09 13:54:35,234::test::INFO] Building model...
[2024-04-09 13:54:35,235::test::INFO] MDM_full_pocket_coor_shared
0%| | 0/12 [00:00<?, ?it/s]
50%|█████ | 6/12 [00:00<00:00, 57.45it/s]
100%|██████████| 12/12 [00:00<00:00, 59.16it/s]
0%| | 0/12 [00:00<?, ?it/s]
sample: 0it [00:00, ?it/s][A
sample: 1it [00:00, 4.22it/s][A
sample: 2it [00:00, 5.87it/s][A
sample: 3it [00:00, 6.89it/s][A
sample: 4it [00:00, 7.43it/s][A
sample: 5it [00:00, 7.79it/s][A
sample: 6it [00:00, 8.02it/s][A
****
sample: 1000it [03:42, 5.16it/s][A
sample: 1000it [03:42, 4.50it/s]
==============================
*** Open Babel Warning in OpenBabel::OBMol::PerceiveBondOrders
Failed to kekulize aromatic bonds in OBMol::PerceiveBondOrders
==============================
*** Open Babel Warning in OpenBabel::OBMol::PerceiveBondOrders
Failed to kekulize aromatic bonds in OBMol::PerceiveBondOrders
100%|██████████| 12/12 [43:10<00:00, 223.90s/it]
100%|██████████| 12/12 [43:10<00:00, 215.88s/it]
[2024-04-09 14:37:48,710::test::INFO] valid:0
[2024-04-09 14:37:48,710::test::INFO] stable:0
could you please provide suggestions how to fix it up?
many thanks,
Please decrease the size of molecules. It will generate valid molecules if you generate molecules which are no more than 40 atoms.
@Layne-Huang
I decreased the --num_atom 20 by using the below command line,
!python -u sample_for_pdb.py \
--ckpt ckpt/500.pt \
--pdb_path data/8h6tcut6_pocket.pdb \
--num_atom 20 \
--num_samples 25 \
--batch_size 10 \
--sampling_type generalized
but the same error occured.
Entropy of n_nodes: H[N] -1.3862943649291992
Entropy of n_nodes: H[N] -3.543935775756836
{'type': 'diffusion', 'network': 'MDM_full_pocket_coor_shared', 'hidden_dim': 128, 'protein_hidden_dim': 128, 'num_convs': 3, 'num_convs_local': 3, 'protein_num_convs': 2, 'cutoff': 3.0, 'g_cutoff': 6.0, 'encoder_cutoff': 6.0, 'time_emb': True, 'atom_num_emb': False, 'mlp_act': 'relu', 'beta_schedule': 'sigmoid', 'beta_start': 1e-07, 'beta_end': 0.002, 'num_diffusion_timesteps': 1000, 'edge_order': 3, 'edge_encoder': 'mlp', 'smooth_conv': False, 'num_layer': 9, 'feats_dim': 5, 'soft_edge': True, 'norm_coors': True, 'm_dim': 128, 'context': 'None', 'vae_context': False, 'num_atom': 10, 'protein_feature_dim': 31}
sdf idr: data\generate_ref
Entropy of n_nodes: H[N] -3.543935775756836
1
Invalid,continue
Invalid,continue
Invalid,continue
Invalid,continue
Invalid,continue
Invalid,continue
Invalid,continue
Invalid,continue
Invalid,continue
Invalid,continue
1
Invalid,continue
Invalid,continue
Invalid,continue
Invalid,continue
Invalid,continue
Invalid,continue
Invalid,continue
Invalid,continue
Invalid,continue
Invalid,continue
1
Invalid,continue
Invalid,continue
Invalid,continue
Invalid,continue
Invalid,continue
Invalid,continue
Invalid,continue
Invalid,continue
Invalid,continue
Invalid,continue
1
Invalid,continue
Invalid,continue
Invalid,continue
Invalid,continue
Invalid,continue
Invalid,continue
Invalid,continue
Invalid,continue
Invalid,continue
Invalid,continue
1
Invalid,continue
Invalid,continue
Invalid,continue
Invalid,continue
Invalid,continue
Invalid,continue
Invalid,continue
Invalid,continue
Invalid,continue
Invalid,continue
'module' is not recognized as an internal or external command,
operable program or batch file.
[2024-04-09 22:43:18,293::test::INFO] Namespace(batch_size=10, build_method='reconstruct', ckpt='ckpt/500.pt', clip=1000.0, config=None, cuda=True, eta=1.0, global_start_sigma=inf, n_steps=1000, num_atom=20, num_samples=25, pdb_path='data/7I11_pocket_8A.pdb', resume=None, sampling_type='generalized', save_traj=False, sdf_path=None, tag='', w_global_node=1.0, w_global_pos=1.0, w_local_node=1.0, w_local_pos=1.0)
[2024-04-09 22:43:18,294::test::INFO] {'model': {'type': 'diffusion', 'network': 'MDM_full_pocket_coor_shared', 'hidden_dim': 128, 'protein_hidden_dim': 128, 'num_convs': 3, 'num_convs_local': 3, 'protein_num_convs': 2, 'cutoff': 3.0, 'g_cutoff': 6.0, 'encoder_cutoff': 6.0, 'time_emb': True, 'atom_num_emb': False, 'mlp_act': 'relu', 'beta_schedule': 'sigmoid', 'beta_start': 1e-07, 'beta_end': 0.002, 'num_diffusion_timesteps': 1000, 'edge_order': 3, 'edge_encoder': 'mlp', 'smooth_conv': False, 'num_layer': 9, 'feats_dim': 5, 'soft_edge': True, 'norm_coors': True, 'm_dim': 128, 'context': 'None', 'vae_context': False, 'num_atom': 10, 'protein_feature_dim': 31}, 'train': {'seed': 2021, 'batch_size': 16, 'val_freq': 250, 'max_iters': 500, 'max_grad_norm': 10.0, 'num_workers': 4, 'anneal_power': 2.0, 'optimizer': {'type': 'adam', 'lr': 0.001, 'weight_decay': 0.0, 'beta1': 0.95, 'beta2': 0.999}, 'scheduler': {'type': 'plateau', 'factor': 0.6, 'patience': 10, 'min_lr': 1e-06}, 'transform': {'mask': {'type': 'mixed', 'min_ratio': 0.0, 'max_ratio': 1.2, 'min_num_masked': 1, 'min_num_unmasked': 0, 'p_random': 0.5, 'p_bfs': 0.25, 'p_invbfs': 0.25}, 'contrastive': {'num_real': 50, 'num_fake': 50, 'pos_real_std': 0.05, 'pos_fake_std': 2.0}}}, 'dataset': {'name': 'crossdock', 'type': 'pl', 'path': './data/crossdocked_pocket10', 'split': './data/split_by_name.pt'}}
[2024-04-09 22:43:18,294::test::INFO] Loading crossdock data...
[2024-04-09 22:43:18,295::test::INFO] Loading data...
[2024-04-09 22:43:18,467::test::INFO] Building model...
[2024-04-09 22:43:18,468::test::INFO] MDM_full_pocket_coor_shared
0%| | 0/5 [00:00<?, ?it/s]
100%|██████████| 5/5 [00:00<00:00, 147.06it/s]
0%| | 0/5 [00:00<?, ?it/s]
sample: 0it [00:00, ?it/s][A
sample: 1it [00:00, 3.53it/s][A
sample: 2it [00:00, 4.28it/s][A
sample: 3it [00:00, 4.39it/s][A
sample: 4it [00:00, 4.54it/s][A
sample: 5it [00:01, 4.71it/s][A
sample: 6it [00:01, 4.79it/s][A
sample: 7it [00:01, 4.66it/s][A
sample: 8it [00:01, 4.70it/s][A
sample: 9it [00:01, 4.77it/s][A
sample: 10it [00:02, 4.80it/s][A
sample: 11it [00:02, 4.80it/s][A
---
sample: 41it [00:08, 4.77it/s][A
sample: 42it [00:09, 4.83it/s][A
sample: 43it [00:09, 4.91it/s][A
---
sample: 996it [08:36, 2.17it/s][A
sample: 997it [08:37, 2.19it/s][A
sample: 998it [08:37, 2.20it/s][A
sample: 999it [08:38, 2.17it/s][A
sample: 1000it [08:38, 2.17it/s][A
sample: 1000it [08:38, 1.93it/s]
==============================
*** Open Babel Warning in OpenBabel::OBMol::PerceiveBondOrders
Failed to kekulize aromatic bonds in OBMol::PerceiveBondOrders
==============================
*** Open Babel Warning in OpenBabel::OBMol::PerceiveBondOrders
Failed to kekulize aromatic bonds in OBMol::PerceiveBondOrders
100%|██████████| 5/5 [40:19<00:00, 502.20s/it]
100%|██████████| 5/5 [40:19<00:00, 483.87s/it]
[2024-04-09 23:23:40,623::test::INFO] valid:0
[2024-04-09 23:23:40,624::test::INFO] stable:0
after running the script, there is no SDF generated in the generate_ref folder.
I'm not sure what problems for that. many thanks for your help.
This is my command python -u sample_for_pdb.py --ckpt 500.pt --pdb_path data/8h6tcut6/8h6tcut6_pocket.pdb --num_atom 20 --num_samples 10 --sampling_type generalized
.
This is the an example of generated molecules:
Please replace the code
except(RuntimeError, MolReconsError, TypeError, IndexError,
OverflowError): # MolReconsError,TypeError,IndexError,OverflowError
print('Invalid,continue')
with
except (RuntimeError, MolReconsError, TypeError, IndexError, OverflowError) as e:
print('An error occurred:', str(e))
traceback.print_exc()
to see what specific error you have met.
@Layne-Huang thank you so much to provide the suggestions.
following your codes, I revised the codes for printing errors in script sample_batch.py
.
after running, the problems occured as same as previously. more details please see the attached file with outputs info outputs.txt
look forward to your help.
many thanks,
hi, layne,
thank you so much for providing such amazing work!
when I try to run
sample_for_pdb.py
, the error occured as below:the command line I use for running
the error info:
my OS windows 10, with python 3.9 Pytorch version = 1.13.1+cu117 Pytorch Geometric version = 2.3.1 CUDA version = 11.7 CUDA available = True Random Pytorch test tensor = tensor([0.7748])
could you please provide the suggesions how to fix it?
btw: Is the generated molecule saved as a sdf file, and is it possible to define the path location of the stored file?
many thanks,
Best,