Open xiaorandu opened 2 months ago
Hi, I tried to load dataset from the file "raw/CID2name.json" since I can find this one in the repo, not sure if this one can work or not. dataset = PubChemSTM_SubDatasets_Graph(dataset_root, size=10)
And here is what I get before 5.3.4 Start Training: print(f"dataset size is: {dataset.size}") for i in range(len(dataset)): print(dataset.CID_list[i], dataset.text_list[I])
dataset size is: 10 29927686 Scutellarin(1-) 29982675 9-cis-4-oxoretinoate 29918871 Monacolin J carboxylate 29986894 11-cis-retinoate 29919282 (R)-imazamox(1-) 29919280 (S)-imazamox(1-) 29986450 6-(O-phosphocholine)oxyhexanoate(1-) 29918994 Tenofovir(1-) 29922751 Cidofovir(1-) 29986451 6-(O-phosphocholine)oxyhexanoic acid
Then when I run: for e in range(3): print("Epoch {}".format(e)) train(e, dataloader, **kwargs)
Could you help me figure this out? Thanks!
if you try to follow steps given in preprocessing the CID2text.json file will appear in data/PubChemSTM_data/raw folder
Thanks for sharing the great work. I was trying to run the demo_pretrain_Graph.ipynb, and cannot run the last step.![Screenshot 2024-05-04 at 12 48 35 PM](https://github.com/chao1224/MoleculeSTM/assets/100817018/494fae4b-7c5a-4d40-a4f9-b00c97d40241)
I found in /MoleculeSTM/datasets/PubChemSTM.py class PubChemSTM_Datasets_Graph(InMemoryDataset), the dataset you use is self.SDF_file_path = os.path.join(self.root, "raw/molecules.sdf") self.CID2text_file = os.path.join(self.root, "raw/CID2text.json")
but I cannot find the raw/CID2text.json under the data/PubChemSTM_data folder. I was wondering if the error above relates to the missing of this file. Please let me know if you have any concerns. thanks!