MangoKiller / MolTC

MIT License
220 stars 42 forks source link

Creation of pre-training dataset #3

Open shrimonmuke0202 opened 1 month ago

shrimonmuke0202 commented 1 month ago

Hi, this work is fascinating Can you provide the source code that is used to create the datasets pertaining to the MolTC?

MangoKiller commented 3 weeks ago

Gibbs free energy pre-training data processing source code: (https://github.com/MangoKiller/MolTC/blob/main/pretrain_data.py), the corresponding data file is: https://huggingface.co/chang04/ddi/blob/main/data/solve_data/pre_train_data.txt

shrimonmuke0202 commented 3 weeks ago

Thanks for you reply. I want to create my own dataset like gibs free energy dataset you have created.. could you please share how you curated that dataset...

On Fri, 23 Aug 2024, 19:55 JunfengFang, @.***> wrote:

Gibbs free energy pre-training data processing source code: ( https://github.com/MangoKiller/MolTC/blob/main/pretrain_data.py), the corresponding data file is: https://huggingface.co/chang04/ddi/blob/main/data/solve_data/pre_train_data.txt

— Reply to this email directly, view it on GitHub https://github.com/MangoKiller/MolTC/issues/3#issuecomment-2307205524, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQ7MTCU4V467D674LS6KLGTZS5A4NAVCNFSM6AAAAABMSUIA3WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBXGIYDKNJSGQ . You are receiving this because you authored the thread.Message ID: @.***>

MangoKiller commented 3 weeks ago

We did not do much processing on the pre-training dataset because its format is consistent with that of the downstream tasks. This is the reference paper of our pre-training dataset: "Transfer learning for solvation free energies: From quantum chemistry to experiments". I don't know if it can solve your question.

shrimonmuke0202 commented 2 weeks ago

I have another query, can you please clarify how you saved the Lora model?