diasks2 / pragmatic_segmenter

Pragmatic Segmenter is a rule-based sentence boundary detection gem that works out-of-the-box across many languages.
MIT License
549 stars 55 forks source link

Test Suiite #21

Closed artreven closed 8 years ago

artreven commented 8 years ago

Dear Kevin,

Thank you for your tool and for comparison with other tools. I was actually looking for test cases for WBD and how different approaches perform on them. Though I have found the results of testing, I have not found "a set of distinct edge cases" that you have created. I would be glad to use your data for testing since Penn Treebank corpora is indeed too expensive for me.

Best, Artem

diasks2 commented 8 years ago

Hi Artem,

You can find the full test suite in this folder: https://github.com/diasks2/pragmatic_segmenter/tree/master/spec/pragmatic_segmenter/languages

Here is a link to a txt file with the distinct edge cases, and here is a link to a Ruby RSpec file.

Is that what you are looking for?

artreven commented 8 years ago

Kevin,

Yes, this is what I was looking for. Thank for a quick reply and for pointing the location.