biblissima / collatinus

Sources of Collatinus software - Latin lemmatizer, morphological analyzer and scansion
http://outils.biblissima.fr/en/collatinus
GNU General Public License v3.0
66 stars 15 forks source link

Collatinus OSX hangs when tagging certain abbreviations #51

Open bnagy opened 5 years ago

bnagy commented 5 years ago

[copy also sent via email]

Hello,

Je vous présente mes excuses de vous écrire en anglais, mais ma grammaire française est horrible. Néanmoins, j'arrive assez bien à la lecture, alors n'hésitez pas à répondre en français si vous voulez :)

I have encountered some bugs while using Collatinus for OSX 11.1 full. I have been using the TCP server with a custom python wrapper with the statistical tagger. Overall, it works very well, and I have tagged ~1.8million sentences. However, certain words cause the server to go into what looks like an infinite loop (100% CPU utilisation, does not respond correctly to further tagging requests).

Based on experimentation, I think the main issues are with abbreviations. Here is the list of words I have discovered so far:

Cn, Sex, Post, Pro, Cap, Ser, Oct, Ap, Kal, Tib, St, Pl

You should be able to replicate the issue by sending a remote tag request with the client. eg:

/Applications/Collatinus_11.1.app/Contents/MacOS/Client_C11 -P3 "Ap"

Please let me know if you would like any more information. I'd be happy to test any updated builds on my dataset.

Thankyou for the software!

PhVerkerk commented 2 years ago

Sorry for the delay ! Yes, there is an endless loop when a sentence ends with an abbreviation. It should be patched in the next version 11.3, which is about to be published. I'll let you know when we publish it.