geohci / edit-types

Edit diffs and type detection for Wikipedia
MIT License
12 stars 3 forks source link

Simplified regex #59

Closed Amamgbu closed 2 years ago

Amamgbu commented 2 years ago

The regex fails when testing for complicated sections. Not sure why it counts it as a sentence change/remove. The previous regex passes the test.

geohci commented 2 years ago

Looks like it's because the new regex doesn't just skip over <number>.<number>, it also skips over <number.. The reason the test fails is that one of the sentences ends with a date so is not correctly split up: He died at Vienna on 21 October 1762.\n\n\n\nAigen was born in Olomouc on 8 October 1685, the son of a goldsmith) So yay our tests worked! I think we can probably abandon this PR then -- I don't see a great solution and like you said, the previous regex worked.