How is the 'chembl_split.json' file generated?

Leslie-yq commented 1 week ago

Hi,author: I noticed that the 'chembl_preprocess.py' file contains the code to preprocess the CHEMBL dataset and generate the 'chembl_processed_chembl32.csv' file, but I did not see the code to generate the 'chembl_split.json' file in the 'data_chembl_Assay_reg.py' file. How is the dataset split and saved as the 'chembl_split.json' file? Looking forward to your reply!

BFeng14 commented 4 days ago

Hi Leslie, the chembl_split.json contains the train, valid, test split of ChEMBL dataset, and they are simply generated randomly~ I wrote the code in my ipynb notebook and I cannot find the it now, and it just contains a random shuffle followed with split 500 for valid and 500 for test.

Leslie-yq commented 11 hours ago

Ok,I see.Thank for your reply! Now I use a dataset I created myself, which contains 965 train assays and 437 test assays. There is no validation set. The maximum number of ligands in the assays is 282, and the average number of ligands is 75. However, when I modified and ran the main_reg.pycode, I found that the output meta_batch_loss['loss'] result was nan.

number of training set : 965
number of testing set : 437
100%|███████████████████████████████████████| 1402/1402 [14:17<00:00,  1.63it/s]
train_cnt:965, test_cnt:437
train_cnt:965, test_cnt:437
282 75.7867332382311
10
/home/Yq/anaconda3/envs/ActFound_e3fp_env/lib/python3.8/site-packages/numpy/lib/function_base.py:2854: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/Yq/anaconda3/envs/ActFound_e3fp_env/lib/python3.8/site-packages/numpy/lib/function_base.py:2855: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
epoch is: 0, mean rmse is: 6.234
epoch is: 0, r2: mean is: 0.000, median is: 0.000, cnt>0.3 is: 0.000
epoch is: 0, R2os: mean is: nan, median is: nan, cnt>0.3 is: 0.000
/home/Yq/anaconda3/envs/ActFound_e3fp_env/lib/python3.8/site-packages/torch/optim/lr_scheduler.py:131: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
  warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. "
/home/Yq/anaconda3/envs/ActFound_e3fp_env/lib/python3.8/site-packages/torch/optim/lr_scheduler.py:156: UserWarning: The epoch parameter in `scheduler.step()` was not necessary and is being deprecated where possible. Please use `scheduler.step()` to step the scheduler. During the deprecation, if epoch is different from None, the closed form is used instead of the new chainable form, where available. Please open an issue if you are unable to replicate your use case: https://github.com/pytorch/pytorch/issues/new/choose.
  warnings.warn(EPOCH_DEPRECATION_WARNING, UserWarning)
/home/Yq/anaconda3/envs/ActFound_e3fp_env/lib/python3.8/site-packages/torch/optim/lr_scheduler.py:728: UserWarning: To get the last learning rate computed by the scheduler, please use `get_last_lr()`.
  warnings.warn("To get the last learning rate computed by the scheduler, "
Current meta_batch_loss: nan
Current meta_batch_loss: nan
Current meta_batch_loss: nan
Current meta_batch_loss: nan
Current meta_batch_loss: nan

What is the reason for this? I have checked that the shape of the data output is fine. Which settings should I modify or which step of my steps is wrong?

BFeng14 / ActFound

How is the 'chembl_split.json' file generated? #15