Closed ZJL0111 closed 3 months ago
If I understand it correctly, you are asking why there is no instruction part in the data. Actually, we do have that, but stored in a separate file. If you load the dataset with the huggingface datasets library (see https://huggingface.co/datasets/osunlp/SMolInstruct), the instruction will be contained in the input of every sample.
Please feel free to reach out if anything is not clear enough. Thanks.
Closing this issue due to no further update. Please feel free to reopen it if needed :)
If I understand it correctly, you are asking why there is no instruction part in the data. Actually, we do have that, but stored in a separate file. If you load the dataset with the huggingface datasets library (see https://huggingface.co/datasets/osunlp/SMolInstruct), the instruction will be contained in the input of every sample.
Please feel free to reach out if anything is not clear enough. Thanks.
thanks!
Hi: thanks for your work, it help me a lot on my molecule SFT task. I get a quastion about the molecule-caption data: in paper, you said your molecule data come from mol_instructions and ChEBI-20; however , except the difference of smiles and selfis, i also find in your molecule dataset,the is no task desctription part, for example, in mol_instructions
while in Smol_instruct dataset, i get
so in your real training, do you use 'instruction' part data like mol_instruct or not? if no, why in the usage part demastration, there is
Looking forward to your reply