Layne-Huang / PMDM

74 stars 17 forks source link

Unexpected Results in Sampling for Custom Pockets #21

Open nickyoungforu opened 2 months ago

nickyoungforu commented 2 months ago

I generate molecules using the following command: python -u sample_for_pdb.py --ckpt checkpoints/500.pt --pdb_path test_data/3ug2.pdb --num_atom 50 --num_samples 2 --sampling_type generalized --build_method build Why are the generated molecules all unstable ?

generated smile: [C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C]#[C].[C]C[C][C][O].[O] generated smile: [C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C].[C]#[C].[N].[N].[O][O] 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:36<00:00, 36.71s/it] [2024-05-06 16:16:47,651::test::INFO] valid:2 [2024-05-06 16:16:47,651::test::INFO] stable:0

Layne-Huang commented 2 months ago

Hi, most of the molecules in the training set are no more than 30 atoms. Therefore, it is hard for the pretrained model to generate a large molecule. Please decrease the argument "num_atom" to no more than 30 atoms.

Tianjl9 commented 1 month ago

Hi, I try to run sample_for_pdb.py ,the command line I use for running:python -u sample_for_pdb.py --ckpt /wuqi/stu/tianjl/500.pt --pdb_path my/1h00/1h00cut20/1h00cut20_pocket.pdb --num_atom 30 --num_samples 10 --sampling_type generalized the error info:sh: 1: module: not found Entropy of n_nodes: H[N] -1.3862943649291992 [2024-05-08 04:10:40,210::test::INFO] Namespace(pdb_path='my/1h00/1h00cut20/1h00cut20_pocket.pdb', sdf_path=None, num_atom=30, build_method='reconstruct', config=None, cuda=True, ckpt='/wuqi/stu/tianjl/500.pt', save_sdf=True, num_samples=10, batch_size=10, resume=None, tag='', clip=1000.0, n_steps=1000, global_start_sigma=inf, w_global_pos=1.0, w_local_pos=1.0, w_global_node=1.0, w_local_node=1.0, sampling_type='generalized', eta=1.0) [2024-05-08 04:10:40,211::test::INFO] {'model': {'type': 'diffusion', 'network': 'MDM_full_pocket_coor_shared', 'hidden_dim': 128, 'protein_hidden_dim': 128, 'num_convs': 3, 'num_convs_local': 3, 'protein_num_convs': 2, 'cutoff': 3.0, 'g_cutoff': 6.0, 'encoder_cutoff': 6.0, 'time_emb': True, 'atom_num_emb': False, 'mlp_act': 'relu', 'beta_schedule': 'sigmoid', 'beta_start': 1e-07, 'beta_end': 0.002, 'num_diffusion_timesteps': 1000, 'edge_order': 3, 'edge_encoder': 'mlp', 'smooth_conv': False, 'num_layer': 9, 'feats_dim': 5, 'soft_edge': True, 'norm_coors': True, 'm_dim': 128, 'context': 'None', 'vae_context': False, 'num_atom': 10, 'protein_feature_dim': 31}, 'train': {'seed': 2021, 'batch_size': 16, 'val_freq': 250, 'max_iters': 500, 'max_grad_norm': 10.0, 'num_workers': 4, 'anneal_power': 2.0, 'optimizer': {'type': 'adam', 'lr': 0.001, 'weight_decay': 0.0, 'beta1': 0.95, 'beta2': 0.999}, 'scheduler': {'type': 'plateau', 'factor': 0.6, 'patience': 10, 'min_lr': 1e-06}, 'transform': {'mask': {'type': 'mixed', 'min_ratio': 0.0, 'max_ratio': 1.2, 'min_num_masked': 1, 'min_num_unmasked': 0, 'p_random': 0.5, 'p_bfs': 0.25, 'p_invbfs': 0.25}, 'contrastive': {'num_real': 50, 'num_fake': 50, 'pos_real_std': 0.05, 'pos_fake_std': 2.0}}}, 'dataset': {'name': 'crossdock', 'type': 'pl', 'path': './data/crossdocked_pocket10', 'split': './data/split_by_name.pt'}} [2024-05-08 04:10:40,211::test::INFO] Loading crossdock data... Entropy of n_nodes: H[N] -3.543935775756836 [2024-05-08 04:10:40,212::test::INFO] Loading data... [2024-05-08 04:10:40,518::test::INFO] Building model... [2024-05-08 04:10:40,518::test::INFO] MDM_full_pocket_coor_shared {'type': 'diffusion', 'network': 'MDM_full_pocket_coor_shared', 'hidden_dim': 128, 'protein_hidden_dim': 128, 'num_convs': 3, 'num_convs_local': 3, 'protein_num_convs': 2, 'cutoff': 3.0, 'g_cutoff': 6.0, 'encoder_cutoff': 6.0, 'time_emb': True, 'atom_num_emb': False, 'mlp_act': 'relu', 'beta_schedule': 'sigmoid', 'beta_start': 1e-07, 'beta_end': 0.002, 'num_diffusion_timesteps': 1000, 'edge_order': 3, 'edge_encoder': 'mlp', 'smooth_conv': False, 'num_layer': 9, 'feats_dim': 5, 'soft_edge': True, 'norm_coors': True, 'm_dim': 128, 'context': 'None', 'vae_context': False, 'num_atom': 10, 'protein_feature_dim': 31} sdf idr: my/1h00/1h00cut20/generate_ref Entropy of n_nodes: H[N] -3.543935775756836 100%|████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 7.21it/s] 0%| | 0/2 [00:00<?, ?it/s]1 sample: 1000it [03:26, 4.83it/s] /wuqi/stu/tianjl/PMDM/sample_for_pdb.py:390: DeprecationWarning: np.long is a deprecated alias for np.compat.long. To silence this warning, use np.compat.long by itself. In the likely event your code does not need to work on Python 2 you can use the builtin int for which np.compat.long is itself an alias. Doing this will not modify any behaviour and is safe. When replacing np.long, you may wish to use e.g. np.int64 or np.int32 to specify the precision. If you wish to review your current use, check the release note link for additional information. Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations indicators = torch.zeros([pos.size(0), len(ATOM_FAMILIES)], dtype=np.long) [04:14:11] Explicit valence for atom # 2 O, 3, is greater than permitted Invalid,continue generated smile: C.CC(=O)O.CCOCC(=O)O.CN.O.O=C(O)O.O=C1Cc2cccc(O)c2N1

*** Open Babel Warning in PerceiveBondOrders Failed to kekulize aromatic bonds in OBMol::PerceiveBondOrders

generated smile: N=C(NC(=O)O)C(=O)O.OC1=CCC=c2c3c(cc(O)c2=C1O)C=CC(O)(O)C3O [04:14:11] Explicit valence for atom # 8 C, 5, is greater than permitted Invalid,continue generated smile: CC(N)(O)O.CNC(N)=O.O=C(O)C1CCCO1.c1ccc(-c2cccnc2)cc1 generated smile: CC1C=CC(O)(C2C(=O)N3CCC2CC3C(=O)O)CC1N.O=C(O)COCC(=O)O generated smile: CC(C)OC(=O)NON.O=C(O)OCC=C=C1C=CC(O)(F)C2CC3CCCC3C12 generated smile: CC1(C(O)(O)O)CC=CCC1O.CNCC=NCNN(C)O.O=C1CCC=CC=N1 generated smile: COCOCCCO.O=C(O)C1c2ncccc2N=C2NCCCCC21.O=C(O)O generated smile: CC(C)CC(=O)O.CCC(N)=O.Oc1ccc(F)c(O)c1NC12CCC(O)(CC1)C2 50%|███████████████████████████████████▌ | 1/2 [03:29<03:29, 209.98s/it]1 sample: 1000it [03:26, 4.85it/s] generated smile: CC1c2ccccc2C2(CCC(O)C(=O)C2(C)C)C1O.NC(=O)NC(=O)CCCO generated smile: CC(=O)NC=O.COC=O.O=C1C(O)C2(O)CCC(O)(C2)C1c1cccc(O)c1O generated smile: CC=CCC(N)=O.O=C(O)O.O=C1c2ccccc2N=CC23CC(CCC12)CC3O

*** Open Babel Warning in PerceiveBondOrders Failed to kekulize aromatic bonds in OBMol::PerceiveBondOrders

generated smile: CC(=N)CCOP(O)O.COC(=O)O.N=C(C(=O)O)C(=O)N1CC=CC2=C1CCC2 generated smile: C.CC1C2C(C=NC1(C)N)CC1CCC3(CCCO3)CC12.O=C(O)C(O)C(O)CO

*** Open Babel Warning in PerceiveBondOrders Failed to kekulize aromatic bonds in OBMol::PerceiveBondOrders

generated smile: CC1C2=C(C=CNCC(=O)O)C=C(O)C=CC2=CCC1(O)O.O=C(O)CC(O)CO generated smile: CCC(O)C(C(=O)O)C(CC(=O)F)C1C=CC(O)CC(O)C1O.NC(=O)COCO generated smile: C.C.NC(=O)CCC(=O)O.OC1C=CC2(C3(O)C=CC=C4OCNCC43)CCC1C2

*** Open Babel Warning in PerceiveBondOrders Failed to kekulize aromatic bonds in OBMol::PerceiveBondOrders

generated smile: O=C(O)c1ccccc1OP(O)O.OC1=CC=CC(C2=CC=C(O)CC=C2)C(O)=C1 generated smile: CC12CC3C4CC1(O)CC(O2)C41CNC(O)(O)C31F.O=C(O)CCO.O=C(O)CO 100%|███████████████████████████████████████████████████████████████████████| 2/2 [06:57<00:00, 208.70s/it] [2024-05-08 04:17:38,714::test::INFO] valid:18 [2024-05-08 04:17:38,714::test::INFO] stable:0 Can you help me find out what the problem is?Thank you! @Layne-Huang

Layne-Huang commented 1 month ago

Hi, PMDM could generate valid molecules for this target in my experiments. You could increase the value of 'num_samples' to allow the model for more attempts.

image
Tianjl9 commented 1 month ago

Thank you for your answer, but I encountered another problem. This is my command line:python -u PMDM_main/sample_for_pdb.py --ckpt /wuqi/stu/tianjl/PMDM/500.pt --pdb_path PMDM_main/my/3hmm/3hmmcut20/3hmmcut20_pocket.pdb --num_atom 35 --num_samples 10000 --sampling_type generalized This is the error output:Traceback (most recent call last): File "/wuqi/stu/tianjl/PMDM/PMDM_main/sample_for_pdb.py", line 393, in gmol = reconstruct_from_generated(pos, new_element, indicators) File "/wuqi/stu/tianjl/PMDM/PMDM_main/utils/reconstruct.py", line 521, in reconstruct_from_generated rd_mol = postprocess_rd_mol_1(rd_mol) File "/wuqi/stu/tianjl/PMDM/PMDM_main/utils/reconstruct.py", line 416, in postprocess_rd_mol_1 atom.SetNumRadicalElectrons(num_radical) OverflowError: can't convert negative value to unsigned int how can i solve this problem? @Layne-Huang