Open chunhualiao opened 2 months ago
OKOK.
@bin123apple a reminder for this ticket. The way we generate the training dataset is a key contribution of the paper. This repos should let people reproduce the dataset generation process.
OKOK, I am modifying some dataset cleaning code, after that is finished, I will upload the all the things.
Just uploading the dataset generation code and the dataset. The detailed steps are in README/Dataset Generation
.
the current readme has instructions for reproducing the evaluation results and how to run inference.
I cannot find instructions to reproduce the dataset generation and fine-tuning steps. Please add them when you are ready.
Another thing is to make sure all the steps can be done on NERSC's perlmutter. I need to run experiments on that machine.
Please let me know when you have something for me to try on perlmutter.
Thanks.