Benchmarking - Githubissues

Quansight / omnisci

Explorations on using MapD and Jupyter together.

4 stars 1 forks source link

Benchmarking #47

Closed dharhas closed 4 years ago

dharhas commented 5 years ago

Perform and communicate proper benchmarking to show comparison to other tools. Target using different IBIS backends as well as against Postgresql/PostGIS for geospatial data types. Both numerical benchmarks as well as user experience of (Altair) charting on top of the different backends should be explored. OmniSci will provide access to some large sample datasets for use in the benchmark

xmnlab commented 5 years ago

Added size.

xmnlab commented 5 years ago

Priority 1. Benchmark Pandas against Ibis with OmnisciDB backend. Priority 2. Benchmark Pandas+Dask against Ibis with OmnisciDB backend. Priority 3. Benchmark Ibis with Big Query and OmnisciDB backends.

xmnlab commented 5 years ago

Datasets for the benchmark

San Fransisco bus dataset
NYC https://data.cityofnewyork.us/Transportation/2017-Yellow-Taxi-Trip-Data/biws-g3hs

References:

xmnlab commented 5 years ago

the repo for ibis-benchmark is https://github.com/Quansight/ibis-benchmark

some updates:

currently works with nyc-taxi dataset
to download the dataset run: ./scripts/download.sh
it currently works with omniscidb locally (using docker it raises an memory error message, it needs to be investigated)
to load nyc-taxi dataset to omniscidb run: ./scripts/load_data.sh

current it just checks very simple operations (head and tail) and store the result at /tmp/log_benchmark.json, eg:

{
"omniscidb": {
"table_head": 0.03497052192687988,
"table_tail": 0.024263644218444826
},
"pandas": {
"table_head": 0.15434300899505615,
"table_tail": 0.1464146375656128
}
}

the benchmark operations are defined at ./ibis_benchmark/benchmark.py

The next step is the adding of more complex expressions for the benchmark

xmnlab commented 5 years ago

@dharhas should this benchmark be initially for CPU only? GPU? both?

dharhas commented 5 years ago

both would make sense to me.

xmnlab commented 4 years ago

the code is currently at https://github.com/quansight/ibis-benchmark and the report page is currently here: https://quansight.github.io/ibis-benchmark/report-nyc-taxi.html

PS: the pandas benchmark is not presented at this benchmark. that was discussed in the meeting and maybe benchmark strategy will change.