Open thomasbird opened 3 years ago
Yes, that should be a really good approach. It seems regex
is backwards compatible, so we can replace it!
We have to figure out exactly how many errors we will allow, and perhaps default to 0, to be backwards compatible, but I can visualise that every detector that detects RegexFilth
should be able to have a 'exact' regex and it's approximate counterpart.
Due to typos or OCR errors regex patterns may not always match when they probably should, e.g. typing capital-O instead of zero in a british postcode, where letters and numbers are not usually interchangeable.
It might be interesting to allow regex's to be matched fuzzily, and the package
regex
allows this! https://pypi.org/project/regex/#approximate-fuzzy-matching-hg-issue-12-hg-issue-41-hg-issue-109We should investigate its use instead of the built in
re
.