Kotlin / dataframe

Structured data processing in Kotlin
https://kotlin.github.io/dataframe/overview.html
Apache License 2.0
821 stars 58 forks source link

GroupBy can't be properly formatted #423

Open revintec opened 1 year ago

revintec commented 1 year ago

using Jupyter Lab 4.0.2 and latest versions of kotlin jupyter and kotlin/dataframe (don't know how to get kotlin jupyter and kotlin/dataframe version, but I just installed them today

after running dataFrame.groupBy{...}, the returned value is org.jetbrains.kotlinx.dataframe.impl.GroupByImpl the mem, ssd, vad columns are LongArrays, it is correctly formatted like this

image

but long model strings are shortened

I'd like to change that, but not globally, just for this dataframe. so I write the following code, but it has 3 problems

  1. LongArray columns are not properly rendered(there is no toHTML in GroupByImpl, and using into(...) results in the same problem

    image
  2. cellContentLimit=-1 is not working according to doc https://kotlin.github.io/dataframe/tohtml.html#configuring-display-for-individual-output

    image image
  3. cell text is wrapped, while the dataframe is also horizontally scrollable. how can we disable cell text wrapping?

    image
revintec commented 1 year ago

is it possible to add <-|-> cursor at the edge of the column to adjust column width dynamically? other software also supports double click at the edge of the column to automatically max column width to fit all content

koperagen commented 1 year ago

Hi. DisplayConfiguration(cellContentLimit = -1) indeed doesn't work. Rendering of arrays is also an oversight. Jupyter integration does it using CellRenderer that is not available in the public API. Sorry for the inconvenience :( I can suggest workaround for rendering and wrapping problems before they're fixed in the library Declare this renderer:

import org.jetbrains.kotlinx.dataframe.jupyter.ChainedCellRenderer
import org.jetbrains.kotlinx.dataframe.jupyter.DefaultCellRenderer
import org.jetbrains.kotlinx.dataframe.jupyter.RenderedContent

class MyCellRenderer : ChainedCellRenderer(DefaultCellRenderer) {
    override fun maybeContent(value: Any?, configuration: DisplayConfiguration): RenderedContent? {
        if (value is LongArray) {
            return RenderedContent.text(value.joinToString(prefix = "[", postfix = "]"))
        }
        return null
    }

    override fun maybeTooltip(value: Any?, configuration: DisplayConfiguration): String? {
        return null
    }
}

And use it like this:

df.toHTML(DisplayConfiguration(cellContentLimit = 100), MyCellRenderer())
        .plus(DataFrameHtmlData("""
            td {
                white-space: nowrap;
            }
        """.trimIndent()))

Custom td style also fixes wrapping

Regarding adjusting column width i can say that it's not possible. Right now we're working on a Swing table rendering (that is way more interactive) in the Kotlin notebooks plugin for IntelliJ