chaoyi-wu / PMC-LLaMA

The official codes for "PMC-LLaMA: Towards Building Open-source Language Models for Medicine"
549 stars 52 forks source link

Collection and preprocessing of book data #23

Open boyue-jiang opened 7 months ago

boyue-jiang commented 7 months ago

Thank you for your excellent work. I am curious about the collection, de-duplication and content filtering of books. Could you please tell me how to collect these large amout of books and provide the preprocessing code? Thank you for your time!