WEEE-Open / skeeelled

An e-learning platform for the modern age
https://weee-open.github.io/skeeelled
5 stars 7 forks source link

Remove all square brackets from string at inference time #76

Closed e-caste closed 2 years ago

e-caste commented 2 years ago

This is because the current model, https://huggingface.co/neuraly/bert-base-italian-cased-sentiment, uses the [, ] characters to define mask tokens. This means that everything between those characters is ignored by the model.

With square brackets (neutral at 81%):

Screenshot 2022-03-23 at 12 02 51

Without square brackets (negative at 77%):

Screenshot 2022-03-23 at 12 03 19

This issue stemmed from the suspicious results obtained by testing the Neuraly transformer (only 8% of accuracy against our currently labeled dataset) in commit https://github.com/WEEE-Open/skeeelled/commit/71fa6f98cdcd28aa060ce4352d071414d9c8adb1.

papadeiv commented 2 years ago

The issues are related to other characters' combination other than those associated to the masking. In commit 7c76f85 they have been removed from the inferred string but retained in the original comment. It is now functioning properly.