apache / datasketches-postgresql

PostgreSQL extension providing approximate algorithms based on apache/datasketches-cpp
https://datasketches.apache.org
Apache License 2.0
84 stars 11 forks source link

fix bug with undefined variable on the call stack for theta_sketch #40

Closed mikeizbicki closed 3 years ago

mikeizbicki commented 3 years ago

Fantastic library, thanks for the hard work!

I've been working on analyzing common crawl web data with postgres using your library (I hope to share some details with you in a little bit once it's all a bit more production ready). I noticed that my pg queries would randomly crash when I called the theta_sketch_get_estimate_and_bounds function. I did some digging, and what I found is that the sql code

CREATE OR REPLACE FUNCTION theta_sketch_get_estimate_and_bounds(theta_sketch) RETURNS double precision[]
    AS '$libdir/datasketches', 'pg_theta_sketch_get_estimate_and_bounds'
    LANGUAGE C STRICT IMMUTABLE;

CREATE OR REPLACE FUNCTION theta_sketch_get_estimate_and_bounds(theta_sketch, int) RETURNS double precision[]
    AS '$libdir/datasketches', 'pg_theta_sketch_get_estimate_and_bounds'
    LANGUAGE C STRICT IMMUTABLE;

binds to the same c function. But there is no default value set for the int in the first function call, which causes the value to be undefined. For whatever reason in my application, about 90% of the time this location in memory contains a valid value for the num_std_devs parameter (1,2,3), but the other 10% it would contain an invalid value and crash. This patch provides a default value so that this will never occur. When I use this code in my system, I no longer get the random crashes.

A quick look through the code of the other data structures make me believe that a similar problem exists there as well, but I didn't go through the effort of putting together a fix for those since I'm not using them and I'm not sure if you'd like my proposed solution here.

jmalkin commented 3 years ago

Apologies for the slow response. Thank you for finding this! We'll try to get on it very soon.

AlexanderSaydakov commented 3 years ago

My approach was to set the default in C, but there is a bug. I think this solution is fine except that the default should be 1, not 3.

mikeizbicki commented 3 years ago

I'll defer to #41 since my patch is incomplete and I don't have time to improve it.

AlexanderSaydakov commented 3 years ago

Did you have a chance to test if #41 fixes the problem you observed? Thank you.