LibreCat / Catmandu

Catmandu - a data processing toolkit
https://librecat.org
176 stars 31 forks source link

regex characters and special characters mixed in Catmandu::Fix::Parser #124

Closed nicolasfranck closed 9 years ago

nicolasfranck commented 9 years ago

Methods Catmandu::Fix::Parser::SingleQuotedString::reify and Catmandu::Fix::Parser::DoubleQuotedString::reify

replace escaped characters by their interpreted equivalent

e.g. \n => "\n" \t => "\t", \b => "\b"

afterwards, these special characters are used in regular expressions, but \b means "word-boundary" in a regular expression, while it means "backslash" within a double-quoted string.

Try this in your fix:

if all_match('test','\btest')
    add_field('a','b')
end

which leads to the following perl code (watch for the deleted "/" in the regex):

(..)
unless ( is_value( $__0->{"test"} )
                && $__0->{"test"} =~ test/ )
            {
                last __FIX__0;
            }
(..)
nicolasfranck commented 9 years ago

maybe this can help:

if all_match('test',regex('\btest'))

end

By putting your regex within the function call "regex", the parser can distinguish between these two types of characters.

I'll make a branch if necessary.

nics commented 9 years ago

fixed in dev d9fb722b6a1cb9757e9a9d666481c64b55d6b98a (if the pattern is a single quoted string)