Kotlin / dataframe

Structured data processing in Kotlin
https://kotlin.github.io/dataframe/overview.html
Apache License 2.0
822 stars 58 forks source link

Add inner / Struct type support in Arrow #536

Open fb64 opened 10 months ago

fb64 commented 10 months ago

Arrow Struct type is read as a Map<String, Any?> object : https://github.com/Kotlin/dataframe/blob/86b80e0c9cd372334e8eff05115a7c50b6ea61bc/dataframe-arrow/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/arrowReadingImpl.kt#L171-L173

But write does not support Map Object and by defaut value is serialized as a String : https://github.com/Kotlin/dataframe/blob/86b80e0c9cd372334e8eff05115a7c50b6ea61bc/dataframe-arrow/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/arrowTypesMatching.kt#L93-L95

The following test fail because c column is a LinkedHashMap in a SingletonList on excepted Dataframe and a single String in an Arraylist on readIpc object

    @Test
    fun testReadIPC(){
        val a by columnOf("one")
        val b by columnOf(2.0)
        val c by listOf(
            mapOf(
                "c1" to Text("inner"),
                "c2" to 4.0,
                "c3" to 50.0,
            ) as Map<String, Any?>
        ).toColumn()
        val d by columnOf("four")
        val expected =  dataFrameOf(a, b, c, d)
        val readIpc = DataFrame.readArrowIPC(expected.saveArrowIPCToByteArray())
        readIpc shouldBe expected
    }

image

image

It could be relevant to add the support of inner type by Writing Map<String,Any?> in a Struct field. Some points need to be addressed before implementation :

Originally posted by @fb64 in https://github.com/Kotlin/dataframe/issues/528#issuecomment-1843132618

zaleslaw commented 5 months ago

We need to answer this during this milestone!