Problem with `tokenize` or regexs

rrthomas commented 7 months ago

I ran into this issue when the functx:lines function didn't seem to work for me, giving me an extra blank line after each line.

But I can find a simple reproducer using an example from: https://www.altova.com/xpath-xquery-reference/fn-tokenize which says:

For example: fn:tokenize("abracadabra", "(ab)|(a)") returns ("", "r", "c", "d", "r", "")

But with fontoxpath:

var fontoxpath = require("fontoxpath")

console.log(fontoxpath.evaluateXPathToStrings(
'fn:tokenize("abracadabra", "(ab)|(a)")',
 null, 
undefined, 
undefined,
 {language: fontoxpath.evaluateXPath.XQUERY_3_1_LANGUAGE}))

Output:

[
  '',          'ab',
  'undefined', 'r',
  'undefined', 'a',
  'c',         'undefined',
  'a',         'd',
  'ab',        'undefined',
  'r',         'undefined',
  'a',         ''
]

Looks like the captures are being incorrectly returned as part of the results of tokenize.

DrRataplan commented 6 months ago

Hey Reuben,

Sorry for the long wait! Many changes: I'm no longer with Fonto, but I'm still involved in development!

Got it: we use regular JS regexes here, which indeed output capture groups... I made a fix, which I'll PR shortly!

Kind regards,

Martin

rrthomas commented 6 months ago

Many thanks @DrRataplan!

FontoXML / fontoxpath

Problem with `tokenize` or regexs #635