Closed dharhas closed 4 years ago
Added size.
Priority 1. Benchmark Pandas against Ibis with OmnisciDB backend. Priority 2. Benchmark Pandas+Dask against Ibis with OmnisciDB backend. Priority 3. Benchmark Ibis with Big Query and OmnisciDB backends.
Datasets for the benchmark
References:
the repo for ibis-benchmark is https://github.com/Quansight/ibis-benchmark
some updates:
./scripts/download.sh
./scripts/load_data.sh
{
"omniscidb": {
"table_head": 0.03497052192687988,
"table_tail": 0.024263644218444826
},
"pandas": {
"table_head": 0.15434300899505615,
"table_tail": 0.1464146375656128
}
}
./ibis_benchmark/benchmark.py
The next step is the adding of more complex expressions for the benchmark
@dharhas should this benchmark be initially for CPU only? GPU? both?
both would make sense to me.
the code is currently at https://github.com/quansight/ibis-benchmark and the report page is currently here: https://quansight.github.io/ibis-benchmark/report-nyc-taxi.html
PS: the pandas benchmark is not presented at this benchmark. that was discussed in the meeting and maybe benchmark strategy will change.
Perform and communicate proper benchmarking to show comparison to other tools. Target using different IBIS backends as well as against Postgresql/PostGIS for geospatial data types. Both numerical benchmarks as well as user experience of (Altair) charting on top of the different backends should be explored. OmniSci will provide access to some large sample datasets for use in the benchmark