Source of pre-training data

SupritYoung / Zhongjing

A Chinese medical ChatGPT based on LLaMa, training from large-scale pretrain corpus and multi-turn dialogue dataset.

Apache License 2.0

303 stars 29 forks source link

Source of pre-training data #4

Closed bbsngg closed 1 year ago

bbsngg commented 1 year ago

Thanks for your work. I would like to know the source of KG, record and report data in the pre-training data. Can you provide it, please?

SupritYoung commented 1 year ago

We use the CMeKG Chinese medical knowledge graph, you can contact the author of the original paper to request . The medical record and report data can't share becuase of the Hospital Privacy Policy. But if you are interested in conducting academic research, we can discuss the details of collaboration. Email: suprit@foxmail.com.

bbsngg commented 1 year ago

Thanks for sharing!