coiled / feedback

A place to provide Coiled feedback
14 stars 3 forks source link

Software version information missing from benchmarks documentation #289

Open jgrg opened 3 months ago

jgrg commented 3 months ago

Your documentation page DataFrames at Scale Comparison: TPC-H has some good information on how you setup the benchmarks but it does not mention the versions of any of the software. This is important if I want to know if the results are still current.

I also don't understand what the More than SQL column in the feature comparison table means. DuckDB has a cross in this row, but it has ways of using it without touching the SQL layer such as relational on Pandas or Ibis.

scharlottej13 commented 3 months ago

Hi @jgrg! Thanks for opening up this issue.

it does not mention the versions of any of the software. This is important if I want to know if the results are still current.

We used the following package versions:

pyspark[sql]==3.4.1 
polars==0.20.16
duckdb==0.10.1

I also don't understand what the More than SQL column in the feature comparison table means. DuckDB has a cross in this row, but it has ways of using it without touching the SQL layer such as relational on Pandas or Ibis.

Thanks for pointing this out! This table is certainly a more subjective summary (especially as compared to the benchmark results). It seems like this could be a good opportunity for us to try out Ibis or DuckDB's relational pandas API and consider making some updates.