dionis / SpanishMedicaLLM

An Open Source Medical Context Large Language Model (LLM) for Q&A and Prompt in Spanish Using Fine-Tuning Techniques with QLora and Epfl with Low Compute Resources. Inspired on Meditron as a suite of open-source medical Large Language Models (LLMs).
https://huggingface.co/epfl-llm
Apache License 2.0
0 stars 0 forks source link

Evaluate different LLM medical for pretraining, finetunning or both #8

Open dionis opened 5 months ago

dionis commented 5 months ago

Study the LLM models trained on Spanish-language corpora, giving priority to those that have been built on the basis of:

Expected result: The argued selection of which models to use for auto-tuning (possibly pretraining) to use for training a model in a medical context.

dionis commented 5 months ago

Study the LLM models trained on Spanish-language corpora, giving priority to those that have been built on the basis of:

dionis commented 5 months ago

Was implemented on branch

dionis commented 5 months ago

Steps for test QLora on Epfl hugginface library

Criteria to select a Spanish LLM model for pretraining or Finetuning

Obtenined hugginface models:

projecte-aina/aguila-7b (Falcon base)
clibrain/Llama-2-7b-ft-instruct-es (Llama2 base)
TheBloke/Barcenas-Mistral-7B-GGUF (Mistral base)
clibrain/lince-zero (Llama2 base)
clibrain/Llama-2-13b-ft-instruct-es (Llama2 base)
google/gemma-7b-it (Gemminis base)
allenai/OLMo-7B (Olmo base)
clibrain/Llama-2-13b-ft-instruct-es-gptq-4bit (Llama2 base)
clibrain/lince-mistral-7b-it-es (Misttral base)
Kukedlc/Llama-7b-spanish (Llama2 base)
google/gemma-7b (Gemminis base)
allenai/OLMo-1B (Olmo base)

Conclusions:

Resources about Spanish LLM Model