flairNLP / flair

A very simple framework for state-of-the-art Natural Language Processing (NLP)
https://flairnlp.github.io/flair/
Other
13.94k stars 2.1k forks source link

Fine-tuning the language model #53

Closed petermartigny closed 6 years ago

petermartigny commented 6 years ago

Hi,

I've discovered the flair framework recently and the experience so far is great! Following what has been by Howard and Ruder with ULMFit, and others, I would be interested in fine-tuning the language models to custom datasets and then plug a custom layer to do some tasks.

I think I can work out the language model fine-tuning by downloading one of your pre-trained models and then use it as initialization of the language model training. However, for the downstream tasks, I wish I could first train on the e.g. classification layer, and then gradually fine-tune the language models layers.

Thank you very much for your help!

alanakbik commented 6 years ago

Hi Peter,

thats a great idea and we'd be very interested to see how that would affect downstream NLP tasks!

I think the good news is that fine-tuning the language model should be very easy: you can load a pre-trained LM and then pass it to the LanguageModelTrainer to fine-tune on your target domain corpus:

# load existing language model
language_model = LanguageModel.load_language_model('/path/to/language/model.pt')

# load target domain corpus 
corpus: TextCorpus = TextCorpus('path/to/your/domain/corpus',
                                language_model.dictionary,
                                language_model.is_forward_lm,
                                character_level=True)

# pass the trained language model to the trainer, along with the new corpus
trainer = LanguageModelTrainer(language_model, corpus)

# continue training the model on the new corpus
trainer.train('./results', sequence_length=250, mini_batch_size=100, learning_rate=20)

The pre-trained language models we distribute are downloaded into ~/.flair/embeddings when you first call them. So the big news forward model can be found at ~/.flair/embeddings/lm-news-english-forward-v0.2rc.pt. You could try fine-tuning one of these on the target corpus.

With regards to the additional layers, I have to first study the ULMFit paper in greater detail (probably sometime next week). If you have any progress to share on this, we'd appreciate it!

petermartigny commented 6 years ago

Thanks for your answer Alan,

There are several interesting things in the ulmfit paper, I think the gradual unfreezing of layers could be first added to flair. I will look at it probably next week, there's a freeze() method in the fast.ai cpde that we could include here.

alanakbik commented 6 years ago

Hello Peter,

that's great! Please let us know if that works - we'd be happy to include it in Flair!

smutuvi commented 5 years ago

I am trying to fine-tune a language model on a target corpus but getting the following error TypeError: unsupported operand type(s) for /: 'str' and 'str'

_My script is as follows: from pathlib import Path from flair.data import Dictionary from flair.models import LanguageModel from flair.trainers.language_model_trainer import LanguageModelTrainer, TextCorpus

language_model = LanguageModel.load_language_model('./best-lm.pt')

load target domain corpus

corpus: TextCorpus = TextCorpus('./corpus', language_model.dictionary, language_model.is_forward_lm, character_level=True)

pass the trained language model to the trainer, along with the new corpus

trainer = LanguageModelTrainer(language_model, corpus)

continue training the model on the new corpus

trainer.train('./results', sequence_length=250, mini_batch_size=100, learning_rate=20, maxepochs=1)

I would be happy to get assistance in resolving it

alanakbik commented 5 years ago

Hello @smutuvi you need to pass a Path (instead of string) to the corpus to indicate the path to the data folder, like this:

corpus: TextCorpus = TextCorpus(Path('./corpus'),
                                language_model.dictionary,
                                language_model.is_forward_lm,
                                character_level=True)

Hope this helps!

smutuvi commented 5 years ago

Thank you @alanakbik. It works!

Am also working on a Swahili LM. Will share it with you soon

alanakbik commented 5 years ago

Cool - a Swahili LM would be great to have in Flair! Look forward to hearing about your results!

codemaster-22 commented 3 years ago

Any idea regarding what should be the size of corpus for fine tuning , I am planning to fine-tune ' news forward ' model on social media corpus , so can you please provide some suggestions on corpus size , currently I am thinking of 50 million word corpus @alanakbik