Open timsaucer opened 1 month ago
The example above is a very simple approach and I think could add some immediate value. Even better would be to do something like pandas where we have a Styler class that allows for nuanced and expressive displays.
https://pandas.pydata.org/docs/user_guide/style.html
https://github.com/pandas-dev/pandas/blob/main/pandas/io/formats/style.py
I don't think we necessarily need to support all of the output formats they do, but it would be nice at least to give users some formatting ability on their tables. These are some of the features I think we need to gain wider adoption.
A follow on question: If we were to build a styler to output things like html (or latex, etc) does it make sense to do so in the datafusion-python repo to push it up into the datafusion repo?
Rounding out options.
I recently came across this python library dedicated to creating nicely formatted html tables great-tables. It currently works with polars and pandas, so any datafusion
user today could call df.to_polars()
or df.to_pandas()
and then use it.
Of course, the conversion feels clunky, so if we went this route, we could explore adding support for datafusion
tables upstream.
Again, just rounding out options. I don't have any strong thoughts on this feature request.
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Many users, especially those who want to try out DataFusion for the first time, will use notebooks, either Jupyter, Databricks, or others. It would be a nice feature to have dataframes shown in these notebooks rendered using html like some other dataframe libraries.
Describe the solution you'd like
In order to do this, we need to implement
_repr_html_
on thePyDataFrame
object. This can operate in the same manner asshow()
and limit the output to a few lines. Additional enhancements could include setting config parameters for how much data to show.Describe alternatives you've considered
The other alternative is to continue to use
show()
to inspect the data. Users can output the dataframe to pandas and then use it's rendering capability.Additional context
Here is a minimal demonstrable version we could start with in
PyDataFrame
This produces the following example:![Screenshot 2024-05-22 at 3 02 07 PM](https://github.com/apache/datafusion-python/assets/24943992/b69a1522-a711-4173-a570-d2e136d461e7)