Open jgrg opened 3 months ago
Hi @jgrg! Thanks for opening up this issue.
it does not mention the versions of any of the software. This is important if I want to know if the results are still current.
We used the following package versions:
pyspark[sql]==3.4.1
polars==0.20.16
duckdb==0.10.1
I also don't understand what the More than SQL column in the feature comparison table means. DuckDB has a cross in this row, but it has ways of using it without touching the SQL layer such as relational on Pandas or Ibis.
Thanks for pointing this out! This table is certainly a more subjective summary (especially as compared to the benchmark results). It seems like this could be a good opportunity for us to try out Ibis or DuckDB's relational pandas API and consider making some updates.
Your documentation page DataFrames at Scale Comparison: TPC-H has some good information on how you setup the benchmarks but it does not mention the versions of any of the software. This is important if I want to know if the results are still current.
I also don't understand what the More than SQL column in the feature comparison table means. DuckDB has a cross in this row, but it has ways of using it without touching the SQL layer such as relational on Pandas or Ibis.