Closed kcpevey closed 3 years ago
@dharhas what story we should tell about this benchmark?
As I understood for the slack discussions:
is there anything else I am missing?
I think the story is the same as the original blogpost, i.e. omnisci lets you do very fast data science on a huge dataset and Ibis gives you a convinient Pandas like api. We don't have to follow the blog post exactly, i.e. we might want to do some more interesting type of analysis but as a first pass we can just mimic exactly.
If we can aslo make it run on linux and windows that would be nice but maybe as a stage 2.
This isn't really a benchmarking task, it is a look you can do cool stuff on big data with omnisci
I didn't have much progress in it.
I started on this, but got slowed down because there wasn't an ibis method to populate an omnisci database from csv without bringing it into memory. I started working out the proper raw SQL to do this, but its unfinished. The alternative is to do "other cool stuff" with existing databases. However, the databases that already exist have been shown and shown in a bunch of examples already....
maybe for the data loading we can use directly omnisql
, for example:
cat dump.sql | omnisql -u admin -p HyperInteractive --db omnisci
do we still need to move this task forward?
I am closing this issue for now. but if it still necessary I will reopen that :) thanks!
some references: