Closed GoogleCodeExporter closed 9 years ago
You'll need to bear in mind that it would report the total number of
substitutions, insertions and deletions that it did in order for it to match,
which would not necessarily be the minimum.
For example, when trying this:
>>> match(r'(?:cats|cat){e<=1}', 'cat')
<regex.Match object; span=(0, 3), match='cat', fuzzy_counts=(0, 0, 1)>
it would report that it matches with one deletion.
This is because, when faced with alternatives, it tries them in sequence, so it
will try to match "cats" first, which it can do with one deletion (the "s").
It doesn't try every possibility by default (that could take a long time!), so
it won't necessarily find the minimum edit distance.
However, you could use the ENHANCEMATCH or BESTMATCH flag to look for a better
fit:
>>> match(r'(?e)(?:cats|cat){e<=1}', 'cat')
<regex.Match object; span=(0, 3), match='cat'>
I'm not sure what the best arrangement would be on the match object. A single
attribute called, say, 'fuzzy_counts' as above? Three separate attributes
called 'substitutions', 'insertions' and 'deletions'?
Original comment by re...@mrabarnett.plus.com
on 28 Mar 2014 at 6:42
[deleted comment]
[deleted comment]
I think the fuzzy_count attribute in the example will suffice, thanks!
Could you explain a little about the difference of the oridinary, enhancematch
and best match? Do they use different search algorithms? It's better to give
the running complexity in the documentation.
Original comment by silversn...@gmail.com
on 29 Mar 2014 at 2:38
The feature isn't available at the moment, but should be in the next release.
I'm right in the middle of another issue at the moment.
There's a brief explanation of the ENHANCEMATCH and BESTMATCH flags in the
"Flags" section here:
http://code.google.com/p/mrab-regex-hg/wiki/GeneralDetails
Original comment by re...@mrabarnett.plus.com
on 29 Mar 2014 at 2:40
Thanks for your efforts!
Original comment by silversn...@gmail.com
on 29 Mar 2014 at 2:43
Added in regex 2014.04.10.
Original comment by re...@mrabarnett.plus.com
on 10 Apr 2014 at 7:54
Hi! Could I check if this is expected behavior or if I'm misunderstanding
something?
1) fuzzy_counts don't include substitutions when ENHANCEMATCH or BESTMATCH is on
>>> re.match(r'(?:cats|cat){e<=1}', 'caz')
<regex.Match object; span=(0, 3), match='caz', fuzzy_counts=(1, 0, 0)>
>>> re.match(r'(?e)(?:cats|cat){e<=1}', 'caz')
<regex.Match object; span=(0, 3), match='caz'>
>>> re.match(r'(?b)(?:cat){e<=1}', 'caz')
<regex.Match object; span=(0, 3), match='caz'>
2) Insertions are not always included in fuzzy_counts
>>> re.match(r'(?:cats){e<=2}', 'c ats')
<regex.Match object; span=(0, 5), match='c ats', fuzzy_counts=(1, 1, 0)>
#I understand the edit distance may not be minimum, assume it is matching 'c
ats' to '-cats' or 'c-ats' or some such thing?
>>> re.match(r'(?b)(?:cats){e<=2}', 'c ats') #minimum edit distance
<regex.Match object; span=(0, 5), match='c ats', fuzzy_counts=(0, 1, 0)>
>>> re.match(r'(?b)(?:cats){e<=2}', 'c a ts') #why edit distance 0?
<regex.Match object; span=(0, 6), match='c a ts'>
Original comment by kush...@gmail.com
on 23 May 2014 at 6:18
Really you should've created a new issue for a bug because this issue is for an
enhancement that has been completed.
It _is_ a bug, though.
Original comment by re...@mrabarnett.plus.com
on 23 May 2014 at 5:09
Fixed in regex-2014.05.23.
Original comment by re...@mrabarnett.plus.com
on 23 May 2014 at 8:26
Original issue reported on code.google.com by
silversn...@gmail.com
on 28 Mar 2014 at 9:50