CAMeL-Lab / camel_tools

A suite of Arabic natural language processing tools developed by the CAMeL Lab at New York University Abu Dhabi.
MIT License
413 stars 73 forks source link

Sort by lex probability #106

Closed go-inoue closed 1 year ago

go-inoue commented 1 year ago

This pull request improves the sorting function for the BERT disambiguation component in MSA and EGY.

We first sort analyses by their scores, alphabetically in diac, and then by lex_logprob. The difference from the previous approach is additional sorting by lex_logprob. Currently, it only works with MSA and EGY databases, because the lex_logprob feature is missing in GLF and LEV databases.