dionis / SpanishMedicaLLM

An Open Source Medical Context Large Language Model (LLM) for Q&A and Prompt in Spanish Using Fine-Tuning Techniques with QLora and Epfl with Low Compute Resources. Inspired on Meditron as a suite of open-source medical Large Language Models (LLMs).
https://huggingface.co/epfl-llm
Apache License 2.0
0 stars 0 forks source link

Create a first version of corpus of medical texts in Spanish for the creation of an LLM #17

Open dionis opened 7 months ago

dionis commented 7 months ago

Taking as reference the corpora used for the construction of Meditron, create a corpus with the same characteristics for training an LLM model.

Sources to consult:

Expected results:

A medical corpus in Spanish that can be used as input for self-tuning or training of an LLM model.

A document that establishes the sources, the decisions for the selection and the characteristics of each of the sources used for the construction of the corpus.

Note: See how it is done in the article "Meditron-70b: Scaling medical pretraining for large language models", annexes and the source code that proposes the presented model.