Where is CID2text.json file?

xiaorandu commented 2 months ago

Thanks for sharing the great work. I was trying to run the demo_pretrain_Graph.ipynb, and cannot run the last step. Screenshot 2024-05-04 at 12 48 35 PM

I found in /MoleculeSTM/datasets/PubChemSTM.py class PubChemSTM_Datasets_Graph(InMemoryDataset), the dataset you use is self.SDF_file_path = os.path.join(self.root, "raw/molecules.sdf") self.CID2text_file = os.path.join(self.root, "raw/CID2text.json")

but I cannot find the raw/CID2text.json under the data/PubChemSTM_data folder. I was wondering if the error above relates to the missing of this file. Please let me know if you have any concerns. thanks!

xiaorandu commented 2 months ago

Hi, I tried to load dataset from the file "raw/CID2name.json" since I can find this one in the repo, not sure if this one can work or not. dataset = PubChemSTM_SubDatasets_Graph(dataset_root, size=10)

And here is what I get before 5.3.4 Start Training: print(f"dataset size is: {dataset.size}") for i in range(len(dataset)): print(dataset.CID_list[i], dataset.text_list[I])

dataset size is: 10 29927686 Scutellarin(1-) 29982675 9-cis-4-oxoretinoate 29918871 Monacolin J carboxylate 29986894 11-cis-retinoate 29919282 (R)-imazamox(1-) 29919280 (S)-imazamox(1-) 29986450 6-(O-phosphocholine)oxyhexanoate(1-) 29918994 Tenofovir(1-) 29922751 Cidofovir(1-) 29986451 6-(O-phosphocholine)oxyhexanoic acid

Then when I run: for e in range(3): print("Epoch {}".format(e)) train(e, dataloader, **kwargs)

Screenshot 2024-05-04 at 1 41 11 PM

Could you help me figure this out? Thanks!

MohdOwais22 commented 1 week ago

if you try to follow steps given in preprocessing the CID2text.json file will appear in data/PubChemSTM_data/raw folder Untitled design (1)

chao1224 / MoleculeSTM

Where is CID2text.json file? #25