Quansight / omnisci

Explorations on using MapD and Jupyter together.
4 stars 1 forks source link

Redo benchmarking with Ibis-OmniSci #132

Closed kcpevey closed 3 years ago

xmnlab commented 3 years ago

some references:

xmnlab commented 3 years ago

@dharhas what story we should tell about this benchmark?

As I understood for the slack discussions:

is there anything else I am missing?

dharhas commented 3 years ago

I think the story is the same as the original blogpost, i.e. omnisci lets you do very fast data science on a huge dataset and Ibis gives you a convinient Pandas like api. We don't have to follow the blog post exactly, i.e. we might want to do some more interesting type of analysis but as a first pass we can just mimic exactly.

dharhas commented 3 years ago

If we can aslo make it run on linux and windows that would be nice but maybe as a stage 2.

dharhas commented 3 years ago

This isn't really a benchmarking task, it is a look you can do cool stuff on big data with omnisci

xmnlab commented 3 years ago

I didn't have much progress in it.

kcpevey commented 3 years ago

I started on this, but got slowed down because there wasn't an ibis method to populate an omnisci database from csv without bringing it into memory. I started working out the proper raw SQL to do this, but its unfinished. The alternative is to do "other cool stuff" with existing databases. However, the databases that already exist have been shown and shown in a bunch of examples already....

xmnlab commented 3 years ago

maybe for the data loading we can use directly omnisql, for example:

cat dump.sql | omnisql -u admin -p HyperInteractive --db omnisci

ref: https://docs.omnisci.com/apis-and-interfaces/omnisql

xmnlab commented 3 years ago

do we still need to move this task forward?

xmnlab commented 3 years ago

I am closing this issue for now. but if it still necessary I will reopen that :) thanks!