chaoyi-wu / PMC-LLaMA

The official codes for "PMC-LLaMA: Towards Building Open-source Language Models for Medicine"
549 stars 52 forks source link

Can you open source the training data? #22

Open vinay-k12 opened 7 months ago

vinay-k12 commented 7 months ago

Hi, I'm trying to replicate the experiment but couldn't exactly match the training data used in the paper. Can you opensource the data sets as well?

chaoyi-wu commented 7 months ago

Hello, the books we used are listed here, https://github.com/chaoyi-wu/PMC-LLaMA/blob/main/MedicalBook.xlsx. Because of the license, I cannot share the exact contents with you, you may collect them online. The other parts for training can be get from the following link:

  1. Papers: https://github.com/allenai/s2orc
  2. Instruction data: https://huggingface.co/datasets/axiong/pmc_llama_instructions