Until now ctparse did not create initial tokens from overlapping matches of regular expressions. However, we realized that doing so only incurs a small speed penalty whist improving resolution accuracy.
Example of the problem: the rule ruleDDMMYYYY matches twice in this string
03.04-05.04.2021
Once as 03.04-05, once as 05.04.2021. If only generating non-overlapping matches , then only the first "token" will be generated, making ctparse find a resolution for 03.04-05.04. - which will ignore the year. Allowing overlapping matches will generate two "tokens", the one above and 05.04.2021 - giving the correct resolution.
Until now
ctparse
did not create initial tokens from overlapping matches of regular expressions. However, we realized that doing so only incurs a small speed penalty whist improving resolution accuracy.Example of the problem: the rule
ruleDDMMYYYY
matches twice in this stringOnce as
03.04-05
, once as05.04.2021
. If only generating non-overlapping matches , then only the first "token" will be generated, makingctparse
find a resolution for03.04-05.04.
- which will ignore the year. Allowing overlapping matches will generate two "tokens", the one above and05.04.2021
- giving the correct resolution.