Closed Tmonster closed 2 months ago
Keep v3 a float64 in polars, otherwise answers are no longer consistent with previous polars versions
I wouldn't say that's fair. There is quite a performance benefit with regard to cache sizes on f32
. That the numbers deviate isn't wrong. It is just a result of operating on a lower precision. I think the checks numbers just need to updated now the precision of the queries are adapted.
Hi @ritchie46,
Sorry for the wait, but I had other duckdblabs matters to attend to, and wanted to be sure the deviations were within tolerance.
I understand that the answer might be different because of the change from float64 to float32, but a tolerance of of 0.001 (or 0.1%) is allowed between answer for the same question between versions. I think this is an acceptable tolerance and solutions/answers should remain precise within this tolerance between versions. I've gone through many of the non-matching answers and while most respect the tolerance there are a few deviations that concern me
Take the following results for example (groupby q5).
┌───────────────────────────────────────────┬─────────────────────────────────────────┬────────────────────────────┬────────────────┬───────────┬───────────┐
│ t1.chk │ t2.chk │ question │ data │t1.version │t2.version │
│ varchar │ varchar │ varchar │ varchar │ varchar │ varchar │
├───────────────────────────────────────────┼─────────────────────────────────────────┼────────────────────────────┼────────────────┼───────────┼───────────┤
│ 2849922064.0;7600000111.0;47498842805.648 │ 2849922064.0;7600000111.0;47271718912.0 │ sum v1:v3 by id6 │ G1_1e9_1e2_5_0 │ 0.19.8 │ 1.0.0 │
│ 2849922064.0;7600000111.0;47498842805.648 │ 2849922064.0;7600000111.0;47271723008.0 │ sum v1:v3 by id6 │ G1_1e9_1e2_5_0 │ 0.19.8 │ 1.0.0 │
│ 2849922064.0;7600000111.0;47498842805.648 │ 2849922064.0;7600000111.0;47271718912.0 │ sum v1:v3 by id6 │ G1_1e9_1e2_5_0 │ 0.20.31 │ 1.0.0 │
├───────────────────────────────────────────┴─────────────────────────────────────────┴────────────────────────────┴────────────────┴───────────┴───────────┤
chk is created using the following SQL queries
CREATE TABLE ans AS SELECT id6, sum(v1) AS v1, sum(v2) AS v2, sum(v3) AS v3 FROM x GROUP BY id6;
-- chk value
SELECT sum(v1) AS v1, sum(v2) AS v2, sum(v3) AS v3 FROM ans;
v1, v2, and v3 are then combined with a ';' separator.
Polars using float64 vs. float32 have results of 47498842805.648 and 47271718912 respectively for chk.v3
. The difference is 227123894 (or 0.4%). This deviation is too high, and can potentially lead to more incorrect results in future runs which I prefer not to debug.
other system results for q5. duckdb: 47498842806 spark: 47498842805.649 data.table: 47498842806 arrow: 47498842806
From what I've seen, no other systems use float32. Most likely because the results are no longer valid within the tolerances.
Because of the deviation, I will not be merging results that use float32. I thought about partial results, but if one question is omitted, then the whole solution posts "exception/internal error etc." on the leaderboard and is not ranked, which I also don't think is fair.
You can check these answers with the following SQL queries (in the branch Tmonster/polars_results_july_5)
-- create table
create table timings as select * from read_csv('time.csv');
-- polars results
select t1.chk, t2.chk, t1.question, t1.data, t1.version, t2.version from
timings t1, timings t2
where t1.chk != t2.chk
and t1.question = t2.question
and t1.task = t2.task
and t1.solution = 'polars'
and t1.data not like 'G1_1e9_2e%'
and t1.solution = t2.solution
and t1.task = 'groupby'
and t1.version < t2.version
and t2.version = '1.0.0' and t1.data = t2.data group by all order by t1.data;
select chk, question, data, version from timings
where question = 'sum v1:v3 by id6'
and data = 'G1_1e9_1e2_5_0'
and solution = 'duckdb';
I am going to merge these results today, and if you want to continue a discussion on moving to float32, we can move it to another issue/PR
From what I've seen, no other systems use float32. Most likely because the results are no longer valid within the tolerances.
I assumed DuckDB used float32 because of this line: https://github.com/duckdblabs/db-benchmark/blob/e54b17fd80fc6bc57be214bb88e5c849fa5b515a/duckdb/groupby-duckdb.R#L47
Of course I am fine with float64, and if you say all systems do so, great! I just want to make sure we all benchmark the same data-types otherwise it's not apples to apples.
Keep v3 a float64 in polars, otherwise answers are no longer consistent with previous polars versions