duckdblabs / db-benchmark

reproducible benchmark of database-like ops
https://duckdblabs.github.io/db-benchmark/
Mozilla Public License 2.0
136 stars 27 forks source link

Polars: rechunk and write string views #79

Closed ritchie46 closed 3 weeks ago

ritchie46 commented 1 month ago

Polars has changed and the current benchmark run doesn't run correctly anymore. On every benchmark run the memory map triggers a full rechunk and deserialization from arrow large string to string view, which isn't what is benchmarked here.

Can you re-run Polars with these changes and latest release?

P.S. I noted that duckb creates the tables with Float data type, so I updated Polars to use Float32 instead of Float64 to make it apples to apples. I think we should ensure that all solutions use similar data types.

ritchie46 commented 3 weeks ago

Ah, yes. I will do a rebase. Thanks for setting the correct writing locations.

On another note. I know this benchmark was inherited and always has a bit of vague startup allowances. (E.g. writing to tables/disks, converting string columns to enum/categories). I am curious if all solutions use the tables as is, or that some kind of metadata is used along the way.

Tmonster commented 3 weeks ago

I've looked at a lot of these startup scripts, and a number of them do seem to take the data and convert it to some internal format. Once in the internal format, I'm sure some of these solutions use stats. What I want to prevent, however, is code designed to make the system run faster for specific queries. For example, setting some flag setting for q7 groupby, then disabling it again for q8.

ritchie46 commented 2 weeks ago

@Tmonster any update on a rerun?

Tmonster commented 2 weeks ago

Should have time to set it up and run it tonight. Should be able to update the results by the end of the week

Tmonster commented 2 weeks ago

Hi Ritchie, I re-ran the benchmark on polars v1.0.0 and now I am having issues with q3 and q5 producing incorrect results compared to different versions. You can check out my branch here to look at the results, but this line filters out polars results for group by q3 and q5 so the report can run.

I wrote a little script to produce some of the different chk values. There is a tolerance, but these results are outside the tolerance

Again it seems to happen only with datasets that have null values

I don't really have the bandwidth to see what breaking change there was in the recent versions and try to fix it. Is there someone on your team (or you) that could look into this? Again just groupby q3 and q5 are affected.

┌────────────────┬───────────────────────────────────────────┬─────────────────────────────────────────┬───────────────────────┬─────────┬─────────┐
│      data      │                   chk_1                   │                  chk_2                  │       question        │   v1    │   v2    │
│    varchar     │                  varchar                  │                 varchar                 │        varchar        │ varchar │ varchar │
├────────────────┼───────────────────────────────────────────┼─────────────────────────────────────────┼───────────────────────┼─────────┼─────────┤
│ G1_1e9_1e2_5_0 │ 2849922064.0;474988853.389                │ 2849922064.0;475124704.0                │ sum v1 mean v3 by id3 │ 0.20.31 │ 1.0.0   │
│ G1_1e9_1e2_5_0 │ 2849922064.0;7600000111.0;47498842805.648 │ 2849922064.0;7600000111.0;47271714816.0 │ sum v1:v3 by id6      │ 0.20.31 │ 1.0.0   │
│ G1_1e9_1e2_5_0 │ 2849922064.0;7600000111.0;47498842805.648 │ 2849922064.0;7600000111.0;47271718912.0 │ sum v1:v3 by id6      │ 0.20.31 │ 1.0.0   │
│ G1_1e9_1e2_5_0 │ 2849922064.0;474988853.389                │ 2849922064.0;474947168.0                │ sum v1 mean v3 by id3 │ 0.20.31 │ 1.0.0   │
│ G1_1e7_1e2_5_0 │ 28498857.0;4749467.632                    │ 28498857.0;4749467.5                    │ sum v1 mean v3 by id3 │ 0.20.31 │ 1.0.0   │
│ G1_1e9_1e2_5_0 │ 2849922064.0;7600000111.0;47498842805.648 │ 2849922064.0;7600000111.0;47271714816.0 │ sum v1:v3 by id6      │ 0.19.8  │ 1.0.0   │
│ G1_1e8_1e2_5_0 │ 284994735.0;47500172.658                  │ 284994735.0;47500520.0                  │ sum v1 mean v3 by id3 │ 0.19.8  │ 1.0.0   │
│ G1_1e9_1e2_5_0 │ 2849922064.0;474988853.389                │ 2849922064.0;474947168.0                │ sum v1 mean v3 by id3 │ 0.19.8  │ 1.0.0   │
│ G1_1e8_1e2_5_0 │ 284994735.0;759971497.0;4750083909.4      │ 284994735.0;759971497.0;4749809152.0    │ sum v1:v3 by id6      │ 0.20.31 │ 1.0.0   │
│ G1_1e7_1e2_5_0 │ 28498857.0;4749467.632                    │ 28498857.0;4749467.5                    │ sum v1 mean v3 by id3 │ 0.19.8  │ 1.0.0   │
│ G1_1e7_1e2_5_0 │ 28498857.0;4749467.632                    │ 28498857.0;4749468.0                    │ sum v1 mean v3 by id3 │ 0.19.8  │ 1.0.0   │
│ G1_1e7_1e2_5_0 │ 28498857.0;75988394.0;474969574.048       │ 28498857.0;75988394.0;474969856.0       │ sum v1:v3 by id6      │ 0.19.8  │ 1.0.0   │
│ G1_1e8_1e2_5_0 │ 284994735.0;47500172.658                  │ 284994735.0;47500004.0                  │ sum v1 mean v3 by id3 │ 0.19.8  │ 1.0.0   │
│ G1_1e9_1e2_5_0 │ 2849922064.0;7600000111.0;47498842805.648 │ 2849922064.0;7600000111.0;47271718912.0 │ sum v1:v3 by id6      │ 0.19.8  │ 1.0.0   │
│ G1_1e7_1e2_5_0 │ 28498857.0;4749467.632                    │ 28498857.0;4749468.0                    │ sum v1 mean v3 by id3 │ 0.20.31 │ 1.0.0   │
│ G1_1e8_1e2_5_0 │ 284994735.0;47500172.658                  │ 284994735.0;47500004.0                  │ sum v1 mean v3 by id3 │ 0.20.31 │ 1.0.0   │
│ G1_1e7_1e2_5_0 │ 28498857.0;75988394.0;474969574.048       │ 28498857.0;75988394.0;474969856.0       │ sum v1:v3 by id6      │ 0.20.31 │ 1.0.0   │
│ G1_1e8_1e2_5_0 │ 284994735.0;47500172.658                  │ 284994735.0;47500520.0                  │ sum v1 mean v3 by id3 │ 0.20.31 │ 1.0.0   │
│ G1_1e8_1e2_5_0 │ 284994735.0;759971497.0;4750083909.4      │ 284994735.0;759971497.0;4749809152.0    │ sum v1:v3 by id6      │ 0.19.8  │ 1.0.0   │
│ G1_1e9_1e2_5_0 │ 2849922064.0;474988853.389                │ 2849922064.0;475124704.0                │ sum v1 mean v3 by id3 │ 0.19.8  │ 1.0.0   │
├────────────────┴───────────────────────────────────────────┴─────────────────────────────────────────┴───────────────────────┴─────────┴─────────┤
│ 20 rows                                                                                                                                6 columns │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
ritchie46 commented 2 weeks ago

I am not sure the previous versions are more "correct". Is there an exact correct answer? This might be due to associativity reasons.

ritchie46 commented 2 weeks ago

@Tmonster we switched from float64 to float32. I think this the reason for the differences you see.

Tmonster commented 2 weeks ago

I'm running the benchmark again because I couldn't manage to reproduce the issue locally. Will update in an hour or so

Tmonster commented 2 weeks ago

I am not sure the previous versions are more "correct". Is there an exact correct answer? This might be due to associativity reasons. There isn't an exact correct answer, since with floating point arithmetic we allow for some tolerance. I think most of these results are within the tolerance.

I re-ran the benchmark, but am still getting invalid results. I'll try to narrow down more the exact question and dataset since the most recent results don't produce the same table as above. I can publish partial results, but I imagine you want to wait until all results are in?

ritchie46 commented 2 weeks ago

I can publish partial results

It at least would make more sense than the current results. :P

We test against these queries in our CI, so I am very surprised that there is something different, except for the floating point change. Did you run the new float32 on older versions as well? Or do you compare against older data?