Closed klee5264 closed 3 years ago
Hi,
Thank you for your interest in DEEPScreen. "chembl27_preprocessed_filtered_act_inact_comps_10.0_20.0_blast_comp_0.2.txt" is the updated version of the ChEMBL v23 dataset mentioned in our paper. This one is constructed using ChEMBL database version 27, whereas the old one with 769,935 data points was constructed using v23. That is why it has much more data points compared to the old one. You do not have to do filtering of any sort on this dataset at all. If you wish to train/test a model, please follow the instructions provided in the Readme file in the PyTorch branch of our repo (which is the new and the main/default branch).
Superb, thanks a lot for the positive and prompt answer!
Hello there,
Thanks for sharing such a nice idea and the code. It is motivating!
Well, I am just beginning to reconstruct your code and have encountered an issue. Please correct me if I am wrong. According to README the file named 'chembl27_preprocessed_filtered_act_inact_comps_10.0_20.0_blast_comp_0.2.txt ' should be the training data set that you obtained through filtering ChEMBL v23 data(about 15M dataset), right?
So, I expected the number of data included in the file be 769,935, matching the one in the paper, but I found 2,292,989 target-ligand pairs in the file, which is nearly three times larger. Is it that you updated the file augmenting the data? or that I have to do some data processing in order to get 769,935 pairs? I am a little confused.
I'd appreciate if you could help me with this.
Thanks