eXist-db / exist

eXist Native XML Database and Application Platform
https://exist-db.org
GNU Lesser General Public License v2.1
421 stars 179 forks source link

xs:time and xs:dateTime stored in Lucene field will be normalised to UTC #4535

Open line-o opened 2 years ago

line-o commented 2 years ago

Describe the bug

When one configures a Lucene Field index to be of type xs:dateTime or xs:time the timezone is normalised to UTC before the value is added to the index. Two entries that mark the same moment in time but in different timezones have a frequency of two in the index. I believe this is not a bug, but more a design decision that needs to be documented. But since it is undocumented developers might expect a different behaviour.

My tests whether sorting of xs:time takes the timezone into account were inconclusive. It appears that only the (normalised) time portion is used. That would mean that the sorting of field values is comparable and not affected by the current implementation.

Since the timezone information is gone upon retrieval it it is impossible to search for, filter or group by timezones with only the values stored in the index.

NOTE: The original information can still be retrieved from the hit itself.

Expected behavior

The fact that values typed as xs:time or xs:dateTime are normalised to UTC needs to be documented. It could be worth to add an example how to store timezone information in a separate field or facet to enable search, filtering and grouping.

Alternatively, a way to efficiently store and retrieve the timezone should be investigated.

To Reproduce

xquery version "3.1";

module namespace lfttt="http://exist-db.org/xquery/lucene-field-time-timezone-test";

declare namespace test="http://exist-db.org/xquery/xqsuite";

declare variable $lfttt:data := document {
<items>
    <item datetime="2022-08-07T10:34:56.789-02:00"/>
    <item datetime="2022-08-07T14:34:56.789+02:00"/>
    <item datetime="2022-08-07T12:34:56.789Z"/>
    <item datetime="2022-08-07T12:34:56.789"/>
    <item time="10:34:56.789-02:00"/>
    <item time="14:34:56.789+02:00"/>
    <item time="12:34:56.789Z"/>
<!--
    time-strings without a timezone are normalized
    against the current server timezone which can
    be different from UTC!
    see https://github.com/eXist-db/exist/issues/4536
    <item time="12:34:56.789"/>
-->
</items>
};

declare variable $lfttt:xconf :=
<collection xmlns="http://exist-db.org/collection-config/1.0">
    <index xmlns:xs="http://www.w3.org/2001/XMLSchema">
        <lucene>
            <text qname="item">
                <field name="datetime" expression="@datetime" type="xs:dateTime"/>
                <field name="time" expression="@time" type="xs:time"/>
            </text>
        </lucene>
    </index>
</collection>;

declare variable $lfttt:collection := "lucene-field-time-timezone-test";

declare
    %test:setUp
function lfttt:setup() {
    let $testCol := xmldb:create-collection("/db", $lfttt:collection)
    let $indexCol := xmldb:create-collection("/db/system/config/db", $lfttt:collection)
    return (
        xmldb:store("/db/" || $lfttt:collection, "test.xml", $lfttt:data),
        xmldb:store("/db/system/config/db/" || $lfttt:collection, "collection.xconf", $lfttt:xconf),
        xmldb:reindex("/db/" || $lfttt:collection)
    )
};

declare
    %test:tearDown
function lfttt:tearDown() {
    xmldb:remove("/db/" || $lfttt:collection),
    xmldb:remove("/db/system/config/db/" || $lfttt:collection)
};

declare
    %test:assertEquals("20220807123456789", "20220807123456789", "20220807123456789", "20220807123456789")
function lfttt:datetime-values-normalized() {
    collection("/db/" || $lfttt:collection)
        //item[
            ft:query(., "datetime:*", map{ 
                "leading-wildcard": "yes",
                "fields": "datetime"
            })]
        ! ft:field(., "datetime")
(:
    cannot cast stored values to xs:dateTime
    see https://github.com/eXist-db/exist/issues/4537
    ! ft:field(., "datetime", "xs:dateTime")
 :)
};

declare
    %test:assertEquals("12:34:56.789Z", "12:34:56.789Z", "12:34:56.789Z")
function lfttt:time-values-with-timezone() {
    collection("/db/" || $lfttt:collection)
        //item[
            ft:query(., "time:*", map{ 
                "leading-wildcard": "yes",
                "fields": "time"
            })]
        ! ft:field(., "time", "xs:time")
};

Tested on:

Additional context

line-o commented 2 years ago

I just tried to add a field that stores the timezone information

<field name="time-tz" 
    expression="fn:timezone-from-time(xs:time(@time))"
    type="xs:dayTimeDuration" />

And the test below passes showing that

Additional findings

declare
    %test:assertEquals("-PT2H", "PT0S", "PT2H")
function lfttt:timezone-values() {
    collection("/db/" || $lfttt:collection)
        //item[
            ft:query(., "time-tz:*", map{ 
                "leading-wildcard": "yes",
                "fields": "time-tz"
            })]
        ! ft:field(., "time-tz")
        => sort()
};

declare
    %test:assertTrue
function lfttt:timezones-values-can-be-cast-to-dayTimeDuration() {
    collection("/db/" || $lfttt:collection)
        //item[
            ft:query(., "time-tz:*", map{ 
                "leading-wildcard": "yes",
                "fields": "time-tz"
            })]
        ! ft:field(., "time-tz")
    => fold-left(true(), function ($result, $next) {
        $result and $next castable as xs:dayTimeDuration
    })
};