deephaven / deephaven-core

Deephaven Community Core
Other
255 stars 80 forks source link

WritableChunk sort documentation and implementation clarification #5824

Open devinrsmith opened 3 months ago

devinrsmith commented 3 months ago

As noted in #5820, WritableDoubleChunk#sort uses Arrays#sort(double[], ...) which does not sort NULL_DOUBLEs first; -Infinity is sorted earlier.

devinrsmith commented 3 months ago

It also appears that WritableCharChunk#sort is incorrect when offset != 0

devinrsmith commented 3 months ago

This may not be a bug, as it is "consistent" with the javadocs:

    /**
     * Sort this chunk in-place using Java's primitive defined ordering.
     * <p>
     * Of note is that nulls or NaNs are not sorted according to Deephaven ordering rules.
     */
    void sort();

In that case though, WritableCharChunk is incorrect because it does sort nulls first.

devinrsmith commented 3 months ago

Of note is that nulls or NaNs are not sorted according to Deephaven ordering rules.

Afaict, Arrays.sort does sort NaNs in the same way DH does - NaNs come last.

devinrsmith commented 3 months ago

Removing the SortFixup region from WritableCharChunk does not cause any tests to fail (normal, nor nightly). This surprises me very much; either, code downstream of WritableCharChunk.sort()

1) does their own fixup (not assuming that WritableCharChunk does it for them) 2) doesn't care about specifics of char sort (only cares about like values grouped together)

or, our testing in this regard is lacking (and we do have operations that depend on the current implementation of WritableCharChunk.sort).

devinrsmith commented 2 days ago

https://iceberg.apache.org/spec/#sorting is relevant info on how iceberg handles double. We may want to create an issue more generally about floating point (in)consistencies in DH.