ebanalyse / NERDA

Framework for fine-tuning pretrained transformers for Named-Entity Recognition (NER) tasks
MIT License
153 stars 35 forks source link

max_len check gives poor warning message #40

Open prhbrt opened 2 years ago

prhbrt commented 2 years ago

Change this:

msg = f'Sentence #{item} length {len(tokens)} exceeds max_len {self.max_len} and has been truncated'

to

msg = f'Sentence #{item} length {len(tokens)} exceeds max_len {self.max_len} - 2 and has been truncated, note that two tokens are used to surround the sentence with the [CLS] and [SEP] token'

Since the warning Sentence 4 length 511 exceeds max_len 512 and has been truncated doesn't make sense.