fivetran / benchmark

Benchmark data warehouses under Fivetran-like conditions
161 stars 42 forks source link

Expand databases benchmarked #3

Closed russellpierce closed 3 years ago

russellpierce commented 6 years ago

Athena and Redshift Spectrum are BigQuery-like. One wonders hope they would stack up in this comparison.

georgewfraser commented 6 years ago

I definitely want to add these. The tricky part of this is that you have to make a bunch of choices when you put the data in S3. Even if we just decide to go with Parquet, we have to decide how big to make the files and the blocks within the files. So it will take some fiddling around to be sure we're being fair to Athena/Spectrum.

russellpierce commented 6 years ago

I agree. Apples to apples is hard - if not impossible. Maybe I'm atypical, but as far as I'm concerned defaults or a (very) limited parameter search is reasonable. Unlike with Redshift's sort keys, where Amazon makes a big deal about optimizing for your query-type, Athena and Spectrum both seem to be advertised more or less as 'drop on S3 and go'. It seems fair to take them more or less at their word for the benchmark or a very limited search of the parameter space.

mike-weinberg commented 3 years ago

@russellpierce as I mentioned in https://github.com/fivetran/benchmark/issues/13 Athena/Spectrum have been shown elsewhere to be so sensitive to file types that getting performance to approach that of normal warehouses requires significant work, as well as that Athena and Spectrum are at best par with bigquery after all optimization is done (as you were hinting at when you said "bigquery-like")

For that reason I will be closing this issue (for now!)