DFKI-NLP / sherlock

State-of-the-art Information Extraction
3 stars 1 forks source link

Lack of documents which are truncated without entity cutoff. #48

Open GabrielKP opened 2 years ago

GabrielKP commented 2 years ago

Currently it seems that when loading datasets all documents which are truncated, also are excluded because their entities supposedly have been truncated as well. This means, no document which has all its entities and still has been truncated exists. This seems highly unlikely.

Is there a bug in the code? Or could it be that for max_seq_len == 128 there indeed is no tacred example in which the entities are preserved but the text is truncated.