InseeFr / Trevas

Transformation engine and validator for statistics.
MIT License
12 stars 5 forks source link

Regex operations #334

Open NicoLaval opened 6 months ago

NicoLaval commented 6 months ago

@noahboerger you reported:

Could you precise please?

noahboerger commented 5 months ago

The testcase this note is raised from is from the BdI testcases the one under "string/pattern_replacement_3".

There the replace function is called with the pattern [a-e-i-o-u] but wanting to only replace the letters a, e, i, o, u. This pattern seems to be weird out of my point of view so i transformed it to the pattern [a|e|i|o|u] to get the expected result.

It was more a note on my side, that maybe the engine of BdI and Trevas may be using a different pattern syntax or something is wrong with this testcase itself. Nothing that should be adjusted in Trevas.

So i would propose to close this issue.

hadrienk commented 5 months ago

What does the spec says about the regexp syntax?

noahboerger commented 5 months ago

The reference manual of match_characters provides the following information (p. 116):

match_characters returns TRUE if op matches the regular expression regexp, FALSE otherwise. The string regexp is an Extended Regular Expression as described in the POSIX standard. Different implementations of VTL may implement different versions of the POSIX standard therefore it is possible that match_characters may behave in slightly different ways.

for replace no explicit reference to a pattern standard seems to be made and also the examples are only containing simple string values.

NicoLaval commented 5 months ago

It's a problem.

I opened an issue in the TF repo