FreedomIntelligence / HuatuoGPT

HuatuoGPT, Towards Taming Language Models To Be a Doctor. (An Open Medical GPT)
Apache License 2.0
1.08k stars 142 forks source link

How can we tell apart the distilled data from the real-world data in the HuatuoGPT-sft-data-v1 dataset? #16

Closed Smile-L closed 11 months ago

Smile-L commented 1 year ago

How can we tell apart the distilled data from the real-world data in the HuatuoGPT-sft-data-v1 dataset?

jymChen commented 1 year ago

Hi @Smile-L ,

Thanks for your attention.

Now the HuatuoGPT-sft-data-v1 dataset is shuffled, making it difficult to extract the distilled data. To address this issue, we will label the source of each data in the future.

Best, Junying