Closed krzysztoffiok closed 3 years ago
This is what happens: Flair processes the text in 512 token blocks (strided). Each block gets its own transformer prediction. The transformer embedding output is put back together.
The model stays exactly the same, as does it limitations. The context for the embedding is still 512 tokens wide. So using this for (longer) text level classification has no added benefit (besides not crashing).
You are completely right on everything you said! I think the flair documentation on this subject is lacking 😅.
Maybe the documentation should be updated, or a warning should be given when a model uses this feature.
@schelv Thank you very much for very quick response.
Please let me rephrase and further detail your answer so I'm sure I understand properly. For example, given:
I get 3 blocks of length 512, 512 and 176 tokens? Next for each block I get a 3072 long embedding outputted from the model, so I end up with 3 embeddings of this length. So next, final entity_level_embedding for the whole text entity is what - averaged from those 3 block embeddings? And of length 3072 of course?
Thank you again.
Striding works for TokenClassification, for TextClassification a text is stripped to 512.
@djstrong thank you, so I end up with first 512 tokens being analyzed by the model correct?
@djstrong could you point me to a place in code where I could modify this behavior?
@djstrong thank you again.
@djstrong I understand that if I decide for a transformer model like longformer Flair will adopt this model_max_length=whatever the longformer (4096?) tokens max is?
OK, I see it does. Thanks again.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Hi,
I understand that most transformer models follow BERT regarding the limit of maximum length of the analyzed text instance and that this values is 512 tokens.
So I deliberately started fine tuning classification model Albert base v2 on text instances over 1000 tokens long and... nothing crashed.
How is it so? How does Flair handle this limit? Or maybe those models are already prepared to handle longer text entities and it is only my ignorance that I don't know how they do it?
Any explanations will be very appreciated!
Best,