comtravo / ctparse

Parse natural language time expressions in python
https://www.comtravo.com
MIT License
131 stars 23 forks source link

allow overlapping regex matches #88

Closed sebastianmika closed 4 years ago

sebastianmika commented 4 years ago

Until now ctparse did not create initial tokens from overlapping matches of regular expressions. However, we realized that doing so only incurs a small speed penalty whist improving resolution accuracy.

Example of the problem: the rule ruleDDMMYYYY matches twice in this string

03.04-05.04.2021

Once as 03.04-05, once as 05.04.2021. If only generating non-overlapping matches , then only the first "token" will be generated, making ctparse find a resolution for 03.04-05.04. - which will ignore the year. Allowing overlapping matches will generate two "tokens", the one above and 05.04.2021 - giving the correct resolution.