glossarist / iev-data

1 stars 1 forks source link

Rework term attributes parser #75

Closed skalee closed 3 years ago

skalee commented 3 years ago

Term attributes parser has been extracted to a separate class. Its source code quality has been improved, and tests have been added. Regular expressions have been cleaned-up, which fixed many unreported bugs.

This pull request does not solve all the issues with term parser, but is a great foundation to move forward.

skalee commented 3 years ago

@ronaldtse if term attribute specifies gender, then we always set plurality. For instance, if term attribute is f, then we set gender = f and plurality = singular. If term attribute does not specify gender, e.g. <some context>, then we set gender = nil and plurality = nil. Is it correct?

skalee commented 3 years ago

@ronaldtse FYI This pull request has fixed numerous obvious bugs, esp. when parsing gender or plurality, but the diff of generated concepts was so big (I estimate 5-10% of all concepts) that I decided not to review them all. Hence, there is a chance that I added a few new bugs.

ronaldtse commented 3 years ago

@skalee this is much cleaner -- thank you! I will post the question as a new issue to follow up with IEC.