eXist-db / exist

eXist Native XML Database and Application Platform
https://exist-db.org
GNU Lesser General Public License v2.1
421 stars 179 forks source link

[BUG] Register binary attribute in collection.xconf.xsd #5432

Open joewiz opened 2 weeks ago

joewiz commented 2 weeks ago

Describe the bug

PR #4541 added support for indexing fields as binary values, controlled by a new @binary attribute on the <field> element. The PR neglected to register this new attribute in collection.xconf.xsd.

As a result, eXide, for example, flags uses of this attribute. See https://github.com/eXist-db/exist/issues/5431#issuecomment-2317631397.

Expected behavior

The xconf schema should define the attribute and its allowed values.

To Reproduce

Validate this document from the linked issue against collection.xconf.xsd:

<collection xmlns="http://exist-db.org/collection-config/1.0">
  <index xmlns:xs="http://www.w3.org/2001/XMLSchema">
    <!-- Full-text indexing with Lucene -->
    <lucene>
      <!-- Elements upon which to build an index. -->
      <text qname="div">
        <field name="sortable" expression="./test/string()" type="xs:string" binary="yes"/>
      </text>
    </lucene>
  </index>
</collection>

Via the linked issue:

If I try to apply the collection.xconf with eXide this error occurs:

Failed to apply configuration: DocValuesField "sortable" appears more than once in this document (only one value is allowed per field)

Screenshots

n/a

Context (please always complete the following information)

Additional context

joewiz commented 2 weeks ago

I'd be happy to submit a PR adding field/@binary to https://github.com/eXist-db/exist/blob/develop/schema/collection.xconf.xsd#L179-L186.

I just need to know the allowed values.

The example code on the Lucene documentation page shows binary="yes", but the tests added in the PR all say binary="true". I take it from https://github.com/eXist-db/exist/pull/4541/files#diff-bdeb3fc906efd9177af97cd1351231dc0e9a9aa14e477246b82b57483d6618e2R113 that either "true" or "yes" are accepted. It seems that any other value is treated as false (same as omitting the attribute).

(The other attributes in this schema that take yes - e.g., caseOpt - don't offer true|false as options. So the binary attribute appears to be an outlier in accepting this larger set of values in this schema.)

If that's a correct interpretation of the code, I'd add a binaryOpt entry to fieldAttrType, defined as follows:

<xs:attributeGroup name="binaryOpt">
    <xs:attribute name="binary" use="optional">
        <xs:simpleType>
            <xs:restriction base="xs:token">
                <xs:enumeration value="yes">
                    <xs:annotation>
                        <xs:documentation>Index as a binary field</xs:documentation>
                    </xs:annotation>
                </xs:enumeration>
                <xs:enumeration value="true">
                    <xs:annotation>
                        <xs:documentation>Index as a binary field</xs:documentation>
                    </xs:annotation>
                </xs:enumeration>
                <xs:enumeration value="no">
                    <xs:annotation>
                        <xs:documentation>Do not index as a binary field</xs:documentation>
                    </xs:annotation>
                </xs:enumeration>
                <xs:enumeration value="false">
                    <xs:annotation>
                        <xs:documentation>Do not index as a binary field</xs:documentation>
                    </xs:annotation>
                </xs:enumeration>
            </xs:restriction>
        </xs:simpleType>
    </xs:attribute>
</xs:attributeGroup>

Is this a correct interpretation of what the code supports / implies?

dizzzz commented 2 weeks ago

As I read the code, you are correct. @adamretter

joewiz commented 2 weeks ago

@dizzzz Thanks!