MaartenGr / KeyBERT

Minimal keyword extraction with BERT
https://MaartenGr.github.io/KeyBERT/
MIT License
3.43k stars 342 forks source link

Highligth n_grams index error #77

Open aucan opened 2 years ago

aucan commented 2 years ago

https://github.com/MaartenGr/KeyBERT/blob/6ab9af1cfe74a126e709539a2467426d0881945c/keybert/_highlight.py#L94

this line should be skip = skip - 2

MaartenGr commented 2 years ago

Thank you for the issue. However, if I were to make it skip = skip - 2 a significant portion of the text will actually not be seen as skip will go into negative values. Could you go into more depth with respect to your issue and create a reproducible example?

aucan commented 2 years ago

The error was not small enough to be solved with a variable. An example code : https://gist.github.com/aucan/57134dc83531c8e29c3e69577ed72eae

MaartenGr commented 2 years ago

Thank you for the reproducible code. I will have to look into this as it seems that the nskip, which I prefer to be automatically calculated, to not fully solve the issue.