Kotlin / dataframe

Structured data processing in Kotlin
https://kotlin.github.io/dataframe/overview.html
Apache License 2.0
799 stars 55 forks source link

Migrate the serialization from Klaxon to kotlinx-serialization library #312

Open devcrocod opened 1 year ago

devcrocod commented 1 year ago

This will help integrate the dataframe better into other libraries where serialization is required, such as ktor, ggdsl, and others

Jolanrensen commented 1 year ago

Didn't @belovrv have concerns about that?

devcrocod commented 1 year ago

@belovrv had doubts about the performance of kotlinx.serialization. We discussed this issue and came to the conclusion that there are no obvious problems. However, since this task is quite time-consuming, even though part of the serialization/deserialization logic will be preserved from Klaxon, we decided to postpone its implementation

zaleslaw commented 1 year ago

I'd like the serialization and io tasks and especially both libraries, klaxon and kotlinx.serialization, will assign it to me

devcrocod commented 6 months ago

A bit more detail about serialization support.

Benefits:

  1. Performance. Based on my limited research, simple conversion to strings is performance-wise equivalent to klaxon, as it performs identical logic. Marking classes as serializable and writing custom serializers, the performance of kotlinx-serialization is expected to be higher than klaxon. However, the actual implementation of the serializers will also affect this. In general, the performance will be slightly better or the same as klaxon. Thus, I wouldn't consider this as the main reason for switching to kotlinx-serialization or for sticking with klaxon.
  2. Code Refactoring. Support for kotlinx-serialization will help us improve the code responsible for json parsing. When I examined the code, I found some code quality issues. Serializers for dataframe will help alleviate this area, which will increase the codebase but will improve code readability and quality.
  3. Type Inference. This will affect type parsing, as some of the load can be shifted to the kotlinx-serialization plugin. This, in turn, will help us partially eliminate the use of platform-dependent reflection. Also, kotlinx-serialization has better support for JsonElement. For example, it can identify NaN unlike klaxon.
  4. Kotlin Ecosystem. kotlinx-serialization is an official library and is part of the Kotlin ecosystem, which will make it easier for us to work within it in the future. For example, with Ktor. There are also guarantees that it will be supported further.
  5. Other Formats. While official support is limited, interestingly, there is support for Protobuf.
  6. Multiplatform Support. kotlinx-serialization is a multiplatform library.
  7. Flexibility and reliability. Writing custom serializers will allow better control over serialization and deserialization. For example, this can help avoid issues with serialization when a Map might be inside a column.

I envision the migration from klaxon to kotlinx-serialization as follows:

  1. To start, we can simply get rid of klaxon by switching JsonElement from klaxon to JsonElement from kotlinx-serialization. Since the logic is identical, this won't be too difficult. It will only be necessary to consider that kotlinx-serialization has JsonPrimitive, unlike klaxon. And jsonObjectBuilder looks a bit different.
  2. This is a more complex part that can be broken down into several steps:
    • The JSON object and transformation related to the schema — https://kotlin.github.io/dataframe/read.html#specify-key-value-paths It will be necessary to analyze how to improve this part or keep the simple parsing of the json tree.
    • Separating serialization and deserialization. Break down for specific objects, such as DataFrame, DataColumn.
    • Identifying necessary objects that are used when working with json. Since working with open interfaces is challenging.
    • Writing custom serializers for classes
zaleslaw commented 6 months ago

@devcrocod could we say that this is our blocker to be multiplatform?

Jolanrensen commented 6 months ago

@zaleslaw Just one of many: https://github.com/Kotlin/dataframe/issues/24#issuecomment-1958239332