deephaven / deephaven-core

Deephaven Community Core
Other
251 stars 80 forks source link

Python PartitionedTable.constituentFor wrapper issues #2469

Closed rcaudy closed 2 years ago

rcaudy commented 2 years ago

Bug Description:

There is a bug in test_partitioned_table.py for test_get_constituent. Excerpted from results of ./gradlew :Integrations:test-py-deephaven:

======================================================================
FAIL [0.013s]: test_get_constituent (tests.test_partitioned_table.PartitionedTableTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/python/tests/test_partitioned_table.py", line 73, in test_get_constituent
    self.assertIsNotNone(self.partitioned_table.get_constituent(keys))
AssertionError: unexpectedly None

----------------------------------------------------------------------

Adding some code to the Java side highlights that this is an issue with type conversions (Python types not matching the expected Java types). There is an additional mystery of why it fails for me, but not CI or @jmao-denver .

Version Information:

Git SHA: ef8560504bd9b650a8e0717aed8aad93a74261d3 (current head of deephaven/deephaven-core:main)

python3 -V -V
Python 3.8.9 (default, Apr 13 2022, 08:48:06)
[Clang 13.1.6 (clang-1316.0.21.2.5)]
java -version
openjdk version "11.0.14" 2022-01-18 LTS
OpenJDK Runtime Environment Zulu11.54+23-CA (build 11.0.14+9-LTS)
OpenJDK 64-Bit Server VM Zulu11.54+23-CA (build 11.0.14+9-LTS, mixed mode)
uname -a
Darwin megalodon.local 21.5.0 Darwin Kernel Version 21.5.0: Tue Apr 26 21:08:37 PDT 2022; root:xnu-8020.121.3~4/RELEASE_ARM64_T6000 arm64

Java Patch:

The following Java patch makes the issue evident:

Index: engine/api/src/main/java/io/deephaven/engine/table/PartitionedTable.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/engine/api/src/main/java/io/deephaven/engine/table/PartitionedTable.java b/engine/api/src/main/java/io/deephaven/engine/table/PartitionedTable.java
--- a/engine/api/src/main/java/io/deephaven/engine/table/PartitionedTable.java  (revision ef8560504bd9b650a8e0717aed8aad93a74261d3)
+++ b/engine/api/src/main/java/io/deephaven/engine/table/PartitionedTable.java  (date 1654198002759)
@@ -2,7 +2,6 @@

 import io.deephaven.api.SortColumn;
 import io.deephaven.api.TableOperations;
-import io.deephaven.api.agg.Partition;
 import io.deephaven.api.filter.Filter;
 import io.deephaven.base.log.LogOutputAppendable;
 import io.deephaven.engine.liveness.LivenessNode;
@@ -11,6 +10,7 @@
 import org.jetbrains.annotations.NotNull;

 import java.util.Collection;
+import java.util.List;
 import java.util.Set;
 import java.util.function.BinaryOperator;
 import java.util.function.UnaryOperator;
@@ -70,6 +70,14 @@
      */
     Set<String> keyColumnNames();

+    /**
+     * Get the {@link ColumnDefinition definitions} of all "key" columns that are part of
+     * {@code table().getDefinition()}. If there are no key columns, the result will be empty.
+     * 
+     * @return The key column definitions
+     */
+    List<ColumnDefinition<?>> keyColumnDefinitions();
+
     /**
      * <p>
      * Are the keys (key column values for a row considered as a tuple) in the underlying {@link #table() partitioned
Index: engine/table/src/main/java/io/deephaven/engine/table/impl/partitioned/PartitionedTableImpl.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/engine/table/src/main/java/io/deephaven/engine/table/impl/partitioned/PartitionedTableImpl.java b/engine/table/src/main/java/io/deephaven/engine/table/impl/partitioned/PartitionedTableImpl.java
--- a/engine/table/src/main/java/io/deephaven/engine/table/impl/partitioned/PartitionedTableImpl.java   (revision ef8560504bd9b650a8e0717aed8aad93a74261d3)
+++ b/engine/table/src/main/java/io/deephaven/engine/table/impl/partitioned/PartitionedTableImpl.java   (date 1654204908167)
@@ -24,9 +24,11 @@
 import io.deephaven.engine.table.iterators.ObjectColumnIterator;
 import io.deephaven.engine.updategraph.UpdateGraphProcessor;
 import io.deephaven.util.SafeCloseable;
+import io.deephaven.util.type.TypeUtils;
 import org.apache.commons.lang3.mutable.MutableInt;
 import org.apache.commons.lang3.mutable.MutableObject;
 import org.jetbrains.annotations.NotNull;
+import org.jetbrains.annotations.Nullable;

 import java.lang.ref.WeakReference;
 import java.util.*;
@@ -48,6 +50,7 @@

     private final Table table;
     private final Set<String> keyColumnNames;
+    private final List<ColumnDefinition<?>> keyColumnDefinitions;
     private final boolean uniqueKeys;
     private final String constituentColumnName;
     private final TableDefinition constituentDefinition;
@@ -79,6 +82,9 @@
             manage(this.table);
         }
         this.keyColumnNames = Collections.unmodifiableSet(new LinkedHashSet<>(keyColumnNames));
+        this.keyColumnDefinitions = keyColumnNames.stream()
+                .map(kcn -> table().getDefinition().getColumn(kcn))
+                .collect(Collectors.toUnmodifiableList());
         this.uniqueKeys = uniqueKeys;
         this.constituentColumnName = constituentColumnName;
         this.constituentDefinition = constituentDefinition;
@@ -100,6 +106,11 @@
         return keyColumnNames;
     }

+    @Override
+    public List<ColumnDefinition<?>> keyColumnDefinitions() {
+        return keyColumnDefinitions;
+    }
+
     @Override
     public boolean uniqueKeys() {
         return uniqueKeys;
@@ -310,7 +321,8 @@
         final List<MatchFilter> filters = new ArrayList<>(numKeys);
         final String[] keyColumnNames = keyColumnNames().toArray(String[]::new);
         for (int kci = 0; kci < numKeys; ++kci) {
-            filters.add(new MatchFilter(keyColumnNames[kci], keyColumnValues[kci]));
+            filters.add(new MatchFilter(keyColumnNames[kci],
+                    checkType(keyColumnDefinitions.get(kci), keyColumnValues[kci])));
         }
         final LivenessManager enclosingLivenessManager = LivenessScopeStack.peek();
         try (final SafeCloseable ignored = LivenessScopeStack.open()) {
@@ -331,6 +343,17 @@
         }
     }

+    private static Object checkType(@NotNull final ColumnDefinition keyColumnDefinition, @Nullable final Object key) {
+        if (key == null) {
+            return null;
+        }
+        if (TypeUtils.getBoxedType(keyColumnDefinition.getDataType()).isAssignableFrom(key.getClass())) {
+            return key;
+        }
+        throw new IllegalArgumentException(
+                "Invalid key input " + key + " (" + key.getClass() + ") for " + keyColumnDefinition);
+    }
+
     @Override
     public Table[] constituents() {
         final LivenessManager enclosingLivenessManager = LivenessScopeStack.peek();

More Revealing Error:

With that applied, the error becomes:

======================================================================
ERROR [0.008s]: test_get_constituent (tests.test_partitioned_table.PartitionedTableTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/python/tests/test_partitioned_table.py", line 73, in test_get_constituent
    self.assertIsNotNone(self.partitioned_table.get_constituent(keys))
  File "/python/deephaven/table.py", line 1443, in get_constituent
    j_table = self.j_partitioned_table.constituentFor(key_values)
RuntimeError: java.lang.IllegalArgumentException: Invalid key input 917 (class java.lang.Short) for ColumnDefinition {name=c, dataType=int, componentType=null, columnType=Normal}
        at io.deephaven.engine.table.impl.partitioned.PartitionedTableImpl.checkType(PartitionedTableImpl.java:354)
        at io.deephaven.engine.table.impl.partitioned.PartitionedTableImpl.constituentFor(PartitionedTableImpl.java:324)

----------------------------------------------------------------------

This shows pretty convincingly that the issue is type conversion (or the lack thereof).

devinrsmith commented 2 years ago

Your local python version shouldn't matter - it should be running the tests inside docker.

devinrsmith commented 2 years ago

I'm more worried why others aren't seeing this same error.

rcaudy commented 2 years ago

I can do this with groovy if I intentionally give the wrong type as input:

// Setup:
t=io.deephaven.csv.CsvTools.readCsv("/Users/rcaudy/dev/deephaven/community/current/core/py/server/tests/data/test_table.csv")
pt=t.partitionBy("c","e")
// This works:
c=pt.constituentFor(917, 167)
// This does not:
c=pt.constituentFor((short)917, 167)
rcaudy commented 2 years ago

So, one answer here might be to make MatchFilter more permissive about compatible numeric types, and downcast/upcast accordingly.

rcaudy commented 2 years ago

Alternatively, PartitionedTableImpl.constituentFor could do that, or the Python wrapper could try to adjust it. Meanwhile, we have no idea why this is breaking sometimes vs other times.

rcaudy commented 2 years ago

Interestingly, without my validation patch, short is fine from Groovy.

rcaudy commented 2 years ago

Not that interesting, it intentionally works that way:

    public static int[] getUnboxedIntArray(Object[] boxedArray) {
        final int[] result = new int[boxedArray.length];
        for (int i = 0; i < result.length; i++) {
            result[i] = (boxedArray[i] != null ? (((Number) boxedArray[i]).intValue()) : NULL_INT);
        }
        return result;
    }
rcaudy commented 2 years ago

So now we need to understand why Python is causing issues in some cases that we would not expect.

rcaudy commented 2 years ago

I did some println style debugging in Python and Java.

This output is a little bit out of order, but Python thinks it is passing 917 and 167, which Java is getting as a Short 917 and a Byte -89:

[class java.lang.Short, class java.lang.Byte]
[917, -89]
[917]
[917]
[-89]
[-89]
  test_get_constituent (tests.test_partitioned_table.PartitionedTableTestCase) ... FAIL (0.008s)
before:  [917, 167]
after:  (917, 167)
rcaudy commented 2 years ago

This just looks like bad truncation to byte.

rcaudy commented 2 years ago

This seems to pass:

Index: py/jpy/src/main/c/jpy_jtype.c
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/py/jpy/src/main/c/jpy_jtype.c b/py/jpy/src/main/c/jpy_jtype.c
--- a/py/jpy/src/main/c/jpy_jtype.c (revision d7ec73b94db205df5a559a8ad881bb23ac20ac64)
+++ b/py/jpy/src/main/c/jpy_jtype.c (date 1654549792772)
@@ -448,14 +448,16 @@
 int JType_CreateJavaNumberFromPythonInt(JNIEnv* jenv, JPy_JType* type, PyObject* pyArg, jobject* objectRef)
 {
     jvalue value;
-    char b;
+    signed char b;
     short s;
     long i;
     long long j;
     j = JPy_AS_JLONG(pyArg);
     i = (int) j;
     s = (short) j;
-    b = (char) j;
+    b = (signed char) j;
+
+    fprintf(stderr, "j%d, i%d, s%d, b%d\n", j, i, s, b);

     if (i != j) {
         value.j = j;