explosion / spaCy

💫 Industrial-strength Natural Language Processing (NLP) in Python
https://spacy.io
MIT License
29.95k stars 4.39k forks source link

How to use PreTrained BERT Model for Textclassification? #4194

Closed SimonF89 closed 5 years ago

SimonF89 commented 5 years ago

Hi, i saw the new release of spacy v2.1 and the new bert model for german. Iam new in this area, but would like to get started with the combination of BERT and spacy.

I would like to classify German free text requirements. Since we don't have a lot of training data available (about 100-200 sentences per class with about 6 requirement classes), I became interested in BERT, which is why I would like to test it for this use case.

How large must the training data be for us to achieve meaningful results with the pre-trained model? Are there any examples of using the new models?

Thank you in advance!

Your Environment

adrianeboyd commented 5 years ago

An example script for text classification is in the spacy-pytorch-transformers repository:

https://github.com/explosion/spacy-pytorch-transformers/blob/master/examples/train_textcat.py

We have seen promising initial results with just 200 instances for English, but it really depends on the data, so you'll have to try it out for your task.

SimonF89 commented 5 years ago

Thx for the quick response! Looking forward to test it! =)

lock[bot] commented 5 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.