BETRT-Extractive-summarizer Multilingual

cyberandy commented 4 years ago

Hi there, what would be the procedure to use this library with either BERT-Base, Multilingual or single-language models like Bert for German, CamemBERT etc.?

Congratulation for your great work!

dmmiller612 commented 4 years ago

Hello Cybercandy,

You can currently use custom models as long as they follow the hugging face design. For example to use the brand new ALBERT implementation, you could do the following:

from transformers import AlbertTokenizer, AlbertModel

albert_model = AlbertModel.from_pretrained('albert-base-v1', output_hidden_states=True)
albert_tokenizer = AlbertTokenizer.from_pretrained('albert-base-v1')
custom_summarizer = Summarizer(custom_model=albert_model, custom_tokenizer=albert_tokenizer)

If you had a custom tuned model, you could use your own there. I do believe that the transformers library does have a pretrained CamemBERT model though.

cyberandy commented 4 years ago

Awesome, thanks @dmmiller612 for the quick reply - so in the case of German that is indeed included in the hugging face distribution I would go for:

from transformers import BertModel, BertTokenizer

bertgerman_model = BertModel.from_pretrained('bert-base-german-cased', output_hidden_states=True)
bertgerman_tokenizer = BertTokenizer.from_pretrained('bert-base-german-cased')
custom_summarizer = Summarizer(custom_model=bertgerman_model, custom_tokenizer=bertgerman_tokenizer)

dmmiller612 / bert-extractive-summarizer

BETRT-Extractive-summarizer Multilingual #27