Kotlin / dataframe

Structured data processing in Kotlin
https://kotlin.github.io/dataframe/overview.html
Apache License 2.0
771 stars 48 forks source link

DataFrame fails on simple actions with casting BigInteger to Long #762

Open apatrida opened 1 week ago

apatrida commented 1 week ago

DataFrame fails on simple actions with casting BigInteger to Long.

For example, the MySQL performance schema table_handles table is defined as:

show create table performance_schema.table_handles;

CREATE TABLE `table_handles` (
  `OBJECT_TYPE` varchar(64) NOT NULL,
  `OBJECT_SCHEMA` varchar(64) NOT NULL,
  `OBJECT_NAME` varchar(64) NOT NULL,
  `OBJECT_INSTANCE_BEGIN` bigint unsigned NOT NULL,
  `OWNER_THREAD_ID` bigint unsigned DEFAULT NULL,
  `OWNER_EVENT_ID` bigint unsigned DEFAULT NULL,
  `INTERNAL_LOCK` varchar(64) DEFAULT NULL,
  `EXTERNAL_LOCK` varchar(64) DEFAULT NULL,
  PRIMARY KEY (`OBJECT_INSTANCE_BEGIN`),
  KEY `OBJECT_TYPE` (`OBJECT_TYPE`,`OBJECT_SCHEMA`,`OBJECT_NAME`),
  KEY `OWNER_THREAD_ID` (`OWNER_THREAD_ID`,`OWNER_EVENT_ID`)
) ENGINE=PERFORMANCE_SCHEMA DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

In the dataframe:

DataFrame.getSchemaForSqlTable(dbConfig, "table_handles")

OBJECT_TYPE: String
OBJECT_SCHEMA: String
OBJECT_NAME: String
OBJECT_INSTANCE_BEGIN: Long
OWNER_THREAD_ID: Long?
OWNER_EVENT_ID: Long?
INTERNAL_LOCK: String?
EXTERNAL_LOCK: String?

Read the SQL table:

val dfLocksTable = DataFrame.readSqlTable(dbConfig, "table_handles")

...next cell

dfLocksTable.filter { OWNER_THREAD_ID != null }

results in error:

class java.math.BigInteger cannot be cast to class java.lang.Long (java.math.BigInteger and java.lang.Long are in module java.base of loader 'bootstrap')
java.lang.ClassCastException: class java.math.BigInteger cannot be cast to class java.lang.Long (java.math.BigInteger and java.lang.Long are in module java.base of loader 'bootstrap')
    at Line_24_jupyter._DataFrameType3_OWNER_THREAD_ID(Line_24.jupyter.kts:19)
    at Line_30_jupyter$res30$1.invoke(Line_30.jupyter.kts:6)
    at Line_30_jupyter$res30$1.invoke(Line_30.jupyter.kts:6)
    at org.jetbrains.kotlinx.dataframe.api.FilterKt.filter(filter.kt:38)
    at Line_30_jupyter.<init>(Line_30.jupyter.kts:6)
    at java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:62)
    at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:502)
    at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:486)
    at kotlin.script.experimental.jvm.BasicJvmScriptEvaluator.evalWithConfigAndOtherScriptsResults(BasicJvmScriptEvaluator.kt:122)
    at kotlin.script.experimental.jvm.BasicJvmScriptEvaluator.invoke$suspendImpl(BasicJvmScriptEvaluator.kt:48)
    at kotlin.script.experimental.jvm.BasicJvmScriptEvaluator.invoke(BasicJvmScriptEvaluator.kt)
    at kotlin.script.experimental.jvm.BasicJvmReplEvaluator.eval(BasicJvmReplEvaluator.kt:49)
    at org.jetbrains.kotlinx.jupyter.repl.impl.InternalEvaluatorImpl$eval$resultWithDiagnostics$1.invokeSuspend(InternalEvaluatorImpl.kt:127)
    at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
    at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:104)
    at kotlinx.coroutines.EventLoopImplBase.processNextEvent(EventLoop.common.kt:277)
    at kotlinx.coroutines.BlockingCoroutine.joinBlocking(Builders.kt:95)
    at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking(Builders.kt:69)
    at kotlinx.coroutines.BuildersKt.runBlocking(Unknown Source)
    at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking$default(Builders.kt:48)
    at kotlinx.coroutines.BuildersKt.runBlocking$default(Unknown Source)
    at org.jetbrains.kotlinx.jupyter.repl.impl.InternalEvaluatorImpl.eval(InternalEvaluatorImpl.kt:127)
    at org.jetbrains.kotlinx.jupyter.repl.impl.CellExecutorImpl$execute$1$result$1.invoke(CellExecutorImpl.kt:79)
    at org.jetbrains.kotlinx.jupyter.repl.impl.CellExecutorImpl$execute$1$result$1.invoke(CellExecutorImpl.kt:77)
    at org.jetbrains.kotlinx.jupyter.repl.impl.ReplForJupyterImpl.withHost(ReplForJupyterImpl.kt:758)
    at org.jetbrains.kotlinx.jupyter.repl.impl.CellExecutorImpl.execute(CellExecutorImpl.kt:77)
    at org.jetbrains.kotlinx.jupyter.repl.execution.CellExecutor$DefaultImpls.execute$default(CellExecutor.kt:12)
    at org.jetbrains.kotlinx.jupyter.repl.impl.ReplForJupyterImpl.evaluateUserCode(ReplForJupyterImpl.kt:581)
    at org.jetbrains.kotlinx.jupyter.repl.impl.ReplForJupyterImpl.access$evaluateUserCode(ReplForJupyterImpl.kt:136)
    at org.jetbrains.kotlinx.jupyter.repl.impl.ReplForJupyterImpl$evalEx$1.invoke(ReplForJupyterImpl.kt:439)
    at org.jetbrains.kotlinx.jupyter.repl.impl.ReplForJupyterImpl$evalEx$1.invoke(ReplForJupyterImpl.kt:436)
    at org.jetbrains.kotlinx.jupyter.repl.impl.ReplForJupyterImpl.withEvalContext(ReplForJupyterImpl.kt:417)
    at org.jetbrains.kotlinx.jupyter.repl.impl.ReplForJupyterImpl.evalEx(ReplForJupyterImpl.kt:436)
    at org.jetbrains.kotlinx.jupyter.messaging.IdeCompatibleMessageRequestProcessor$processExecuteRequest$1$response$1$1.invoke(IdeCompatibleMessageRequestProcessor.kt:140)
    at org.jetbrains.kotlinx.jupyter.messaging.IdeCompatibleMessageRequestProcessor$processExecuteRequest$1$response$1$1.invoke(IdeCompatibleMessageRequestProcessor.kt:139)
    at org.jetbrains.kotlinx.jupyter.execution.JupyterExecutorImpl$Task.execute(JupyterExecutorImpl.kt:42)
    at org.jetbrains.kotlinx.jupyter.execution.JupyterExecutorImpl$executorThread$1.invoke(JupyterExecutorImpl.kt:82)
    at org.jetbrains.kotlinx.jupyter.execution.JupyterExecutorImpl$executorThread$1.invoke(JupyterExecutorImpl.kt:80)
    at kotlin.concurrent.ThreadsKt$thread$thread$1.run(Thread.kt:30)

almost any actions does this no matter if working with the numeric type or not.

zaleslaw commented 1 week ago

@apatrida sad to hear, thanks for reporting, could please post the

if it's possible

apatrida commented 6 days ago

Driver and Dataframe are coming from %use mysql, dataframe so are whatever version it is pulling, most recent as of July 1, 2024.

DB is MySQL (Percona server distribution) 8.0.34

apatrida commented 6 days ago

Notebook attached. PerfQueries.ipynb.zip

You just need to add these three env variables to point at a mysql database that has performance schema enabled (default settings are probably correct).

val URL = "jdbc:mysql://${System.getenv("DB_HOST")}:3306/performance_schema"
val USER_NAME = System.getenv("DB_USER")
val PASSWORD = System.getenv("DB_PASS")

or modify the notebook to hard coded values.

apatrida commented 6 days ago

Note that a fair number of jdbc / sql things don't work as expected in notebooks. This isn't the only problem encountered. Listing table schemas errors out as it expects no database selected, but the driver doesn't function without database selected. I think the test suite might need to be more robust for MySQL.

zaleslaw commented 6 days ago

Good afternoon, thanks for the detailed feedback, we will definitely check your example and try to solve some of the problems in the next release 0.14. Thanks for the help!