edgedb / imdbench

IMDBench — Realistic ORM benchmarking
https://edgedb.github.io/imdbench
Apache License 2.0
242 stars 24 forks source link

Add eschema and 3 queries that will be used in the benchmarks. #1

Closed vpetrovykh closed 6 years ago

vpetrovykh commented 6 years ago

In the end I opted for generating the entire movie dataset. Largely because I'm not a lawyer and can't really guarantee that the "free for non-commercial use" files are OK for us. For benchmarks we don't care about real data, we care that the data has realistic structure and that we can create some arbitrary amount of it. The current data generator does that - it generates approximately the correct proportions of movies/directors/actors, etc.

There's a simple converter that takes this data and simply outputs a whole bunch of EQL. However, for a non-trivial dataset we're talking megabytes of EQL and if we ever want a dataset with 500,000 reviews that'll easily be close to 500Mb of EQL. So feeding it into the server may require a bit of tweaking the output and maybe producing multiple chunks. Incidentally, the really large datasets also take a few minutes to generate (people=10_000, users=10_000, reviews=50_000 -> ~3minutes).