Open djbpitt opened 5 years ago
@djbpitt to properly asses this we need more info, including index configuration, sample data etc. Could you try to expand this into either a self-contained XQSuite test, that reproduces the problem, or alternatively share a minimal xar that contains all, and only, the files necessary to reproduce this. Thx
@djbpitt are you sure let $bgTitles := $auxTitles[ngram:contains(*,$target)]/bg
is correct?
at first glance it seems it should be let $bgTitles := $auxTitles[ngram:contains(.,$target)]/bg
(. instead of *) unless you really mean any element under title?
(not that it explains your problem, just wondering...)
What is the problem
I am filtering a list of titles (in a single auxiliary document) to retain only those that contain a particular substring (using the ngram index). Once I have found the exact auxiliary titles that contain the target substring, I am using those exact titles to filter a collection of manuscript descriptions (each in a separate XML file) to keep only those that have a matching
<title>
element. I'm usingngram:contains()
, rather than an explicit equality test, because it gets me case-insensitivity and yields the correct results (that is, substring matches don't contaminate the results).it errors out with:
If I'm reading this correctly,
ngram:contains()
thinks that$title
has a cardinality not of 1 (the actual cardinality of$title
), but of 16 (the cardinality of$bgTitles
, that is, of the sequence variable in thefor
statement, rather than of the range variable).If I change the manuscript filtering to use general equality instead of
ngram:contains()
, as follows:it returns the expected results.
What did you expect
I expect that the range variable in a
for
statement will always have a cardinality of 1, and that, therefore,ngram:contains()
will never raise a cardinality error about its second argument when the second argument is the range variable in afor
statement.Describe how to reproduce or add a test
I would be happy to make my data available on request, but if the issue is, in fact, a bug in the implementation of
ngram:contains()
, it should be reproducible with other data.Context information
Please always add the following information