apache / datasketches-postgresql

PostgreSQL extension providing approximate algorithms based on apache/datasketches-cpp
https://datasketches.apache.org
Apache License 2.0
85 stars 11 forks source link

Segfault when querying pre-built frequent_items_string sketch #54

Closed andyndang closed 1 year ago

andyndang commented 1 year ago

Hi,

We are testing the postgres extension and came across the issue of segfaulting when querying the fi strings sketches. We built these sketches in other environments (Java/Python) and passed them into postgres.

It doesn't occur with all the sketches, here's an example that would trigger the exception

SELECT frequent_strings_sketch_result_no_false_negatives(cast(decode('BAEKBwYAAAAsAAAAAAAAAD4fAAAAAAAAUgAAAAAAAAABAAAAAAAAAAEAAAAAAAAAAQAAAAAAAAABAAAAAAAAAAEAAAAAAAAAAQAAAAAAAAABAAAAAAAAAAEAAAAAAAAAAQAAAAAAAAABAAAAAAAAAAEAAAAAAAAAAQAAAAAAAAABAAAAAAAAAAEAAAAAAAAAAQAAAAAAAAABAAAAAAAAAAEAAAAAAAAAAQAAAAAAAAABAAAAAAAAAAEAAAAAAAAAAQAAAAAAAAABAAAAAAAAAAEAAAAAAAAAAQAAAAAAAAABAAAAAAAAAAEAAAAAAAAAAQAAAAAAAAABAAAAAAAAAAEAAAAAAAAAAQAAAAAAAAABAAAAAAAAAAEAAAAAAAAAAQAAAAAAAAABAAAAAAAAAAEAAAAAAAAAAQAAAAAAAAABAAAAAAAAAAEAAAAAAAAAAQAAAAAAAAABAAAAAAAAAAEAAAAAAAAAAQAAAAAAAAABAAAAAAAAAAEAAAAAAAAAEgAAAC0xOS4wODk0NjMzMTk4OTEwNhEAAAA5MTMuMzE3NzU1MDQ4MjE4MREAAAA2MTQuNzA1ODYyMjk0MDEyNxIAAAAxNDc1LjAwNDAwNjI2ODI0MTcQAAAANjcuNjQzODAyODEwMzMzNhAAAAA3NzkuODExMDc3MDI3NTU5EgAAADEzNjguMjUzNDk3MzMyNjU3NBEAAAA4MzMuMzkwNzE3MTI3NjQxOBMAAAAtMTYwLjU2NzMwNTkzNTcyNTA3EgAAADE0NDMuMTE0Mzk1MjIwODI5NREAAAAxMTUyLjMyNzc0NTc4NzA3OBEAAAAxMjA5LjMzNjQ1NTgwNzkwMxIAAAAxMjQwLjg1MDEzNjQxMTM5OTURAAAAMzM1LjAyNTI0Njg3OTk4MzcRAAAAMTE4Ni4zNDYzMjQ4Mjk2MTMSAAAANDYuOTQyNTQ1MzU1NzI1NDE0EgAAADExODUuNjkwODAxNjQwNjE4MhEAAAA3ODMuNTg4MDA4NzI2MjM3MxIAAAAxNDU5LjUwODA4NjUwMjEwODkRAAAAMzA4LjAxOTY3NDk2NzI2MzQSAAAAMTQyOC4zMDYwMzQ1Njk2NzYzEgAAADEwMzguODc4MzIzMjc0OTU4MxIAAAAtNTQ5LjY1NDk4MTQ2MDc5ODERAAAAMTcwNy4yMDkxMzM2MDg0ODERAAAAMzI3LjA0MjI2NzM1NjQ4NjQQAAAAODIxLjUzNTExMjI4ODk0MREAAAA3MDUuODY0MTAxMTY3OTI1NBEAAAA5OTUuOTk1ODA2ODM2MjA4MREAAAA2NTUuMDE0OTI4NzQ5MDk4MxMAAAAtMTc1Ljg2OTAwNTA5MDM2OTU0EgAAADEwMTUuMzkyMjk0NDg1MTU1NhIAAAAtNTMxLjEyMjMwNzU0NDE4NTkSAAAALTY3OS40OTM3ODU3MDQ5NjU1EgAAADExNjkuNTgwMzE5MDU4NzI3NhIAAAAxMTg4Ljc0ODkxNTA3NjM5MzQQAAAAMzkyLjQwODUwMjUxNDM3MhIAAAAzMzAuMDgzODQ4MjkwMTMwOTYSAAAAMzk0Ljg0MDIyOTAxNDQ2MzU1EgAAADEyNTEuMjQwMDQyMzk3Mjg5NRIAAAAtNDM4LjcxNzQ1OTIzNTAyNjESAAAAMjE5NS45ODE3MTQxODU5MDg1EQAAADIxNC4zOTE2MTMwMTI4Nzc5EgAAADQzOS4xNzg0MDUyOTU2NzI0NhIAAAAtMzEzLjM4MzIzNzMyMDAyNjY=', 'base64') as frequent_strings_sketch))

The above sketch (base64 encoded) can be parsed fine with datasketches Python 3.5.0 from Pypi.

Postgers logs doesn't show much information either.

2022-10-25 23:53:28.363 UTC [1] LOG:  database system is ready to accept connections

2022-10-25 23:53:35.733 UTC [1] LOG:  server process (PID 118) was terminated by signal 11: Segmentation fault
AlexanderSaydakov commented 1 year ago

What version are you testing?

andyndang commented 1 year ago

The latest version from master.

I've managed to narrow it down to this line: https://github.com/apache/datasketches-postgresql/blob/master/src/frequent_strings_sketch_c_adapter.cpp#L93

Seems that frequent_items_sketch doesn't have an explicit destructor and that might be why it works some of the time but fails for some other?

andyndang commented 1 year ago

Looks like we don't have an explicit destructor in FI sketch: https://github.com/apache/datasketches-cpp/blob/master/fi/include/frequent_items_sketch.hpp

jmalkin commented 1 year ago

Right, because the default destructor should be sufficient since there are no pointers in the class. The hash map does have a destructor, and that's automatically called when it goes out of scope.

The master branch of the C++ library has unreleased, API-breaking changes. Please try the latest release tag.

andyndang commented 1 year ago

Would love to but it looks like the code in this repo isn't compatible with the latest datasketches-cpp branch and I'm not experienced enough with C++ to debug the errors. Seems to require some update on this library as well.

However it doesn't look like the deallocator for reverse_hash_map has changed since 3.4: https://github.com/apache/datasketches-cpp/blob/3.4.0/fi/include/reverse_purge_hash_map_impl.hpp#L98 versus: https://github.com/apache/datasketches-cpp/blob/master/fi/include/reverse_purge_hash_map_impl.hpp#L98

andyndang commented 1 year ago

I threw in a bunch of printf statements into the deallocator (see code below) and it looks like it failed to deallocate the first entry and encountered exception with this sketch

2022-10-26 07:34:34.485 UTC [1] LOG:  database system is ready to accept connections
Attempting to deallocate hashmap. Size: 64. Num active: 44
Deacllocating active item index: 0

SQL

SELECT frequent_strings_sketch_result_no_false_negatives(cast(decode('BAEKBwYAAAAsAAAAAAAAAD4fAAAAAAAAUgAAAAAAAAABAAAAAAAAAAEAAAAAAAAAAQAAAAAAAAABAAAAAAAAAAEAAAAAAAAAAQAAAAAAAAABAAAAAAAAAAEAAAAAAAAAAQAAAAAAAAABAAAAAAAAAAEAAAAAAAAAAQAAAAAAAAABAAAAAAAAAAEAAAAAAAAAAQAAAAAAAAABAAAAAAAAAAEAAAAAAAAAAQAAAAAAAAABAAAAAAAAAAEAAAAAAAAAAQAAAAAAAAABAAAAAAAAAAEAAAAAAAAAAQAAAAAAAAABAAAAAAAAAAEAAAAAAAAAAQAAAAAAAAABAAAAAAAAAAEAAAAAAAAAAQAAAAAAAAABAAAAAAAAAAEAAAAAAAAAAQAAAAAAAAABAAAAAAAAAAEAAAAAAAAAAQAAAAAAAAABAAAAAAAAAAEAAAAAAAAAAQAAAAAAAAABAAAAAAAAAAEAAAAAAAAAAQAAAAAAAAABAAAAAAAAAAEAAAAAAAAAEgAAAC00MzguNzE3NDU5MjM1MDI2MREAAAAxNzA3LjIwOTEzMzYwODQ4MREAAAA2NTUuMDE0OTI4NzQ5MDk4MxAAAAA2Ny42NDM4MDI4MTAzMzM2EgAAAC01MzEuMTIyMzA3NTQ0MTg1OREAAAA4MzMuMzkwNzE3MTI3NjQxOBEAAAA2MTQuNzA1ODYyMjk0MDEyNxIAAAAxMTg4Ljc0ODkxNTA3NjM5MzQRAAAAMzI3LjA0MjI2NzM1NjQ4NjQRAAAAOTk1Ljk5NTgwNjgzNjIwODERAAAANzA1Ljg2NDEwMTE2NzkyNTQSAAAAMTQ0My4xMTQzOTUyMjA4Mjk1EgAAADMzMC4wODM4NDgyOTAxMzA5NhAAAAA3NzkuODExMDc3MDI3NTU5EgAAADE0MjguMzA2MDM0NTY5Njc2MxIAAAAtNjc5LjQ5Mzc4NTcwNDk2NTURAAAAMTIwOS4zMzY0NTU4MDc5MDMSAAAAMTI1MS4yNDAwNDIzOTcyODk1EgAAADQzOS4xNzg0MDUyOTU2NzI0NhMAAAAtMTc1Ljg2OTAwNTA5MDM2OTU0EgAAADEzNjguMjUzNDk3MzMyNjU3NBEAAAA3ODMuNTg4MDA4NzI2MjM3MxIAAAAxMDE1LjM5MjI5NDQ4NTE1NTYSAAAANDYuOTQyNTQ1MzU1NzI1NDE0EgAAAC0xOS4wODk0NjMzMTk4OTEwNhEAAAAzMDguMDE5Njc0OTY3MjYzNBAAAAAzOTIuNDA4NTAyNTE0MzcyEgAAAC0zMTMuMzgzMjM3MzIwMDI2NhEAAAAxMTUyLjMyNzc0NTc4NzA3OBIAAAAxMTY5LjU4MDMxOTA1ODcyNzYSAAAAMTI0MC44NTAxMzY0MTEzOTk1EgAAADE0NzUuMDA0MDA2MjY4MjQxNxEAAAAzMzUuMDI1MjQ2ODc5OTgzNxEAAAAxMTg2LjM0NjMyNDgyOTYxMxEAAAA5MTMuMzE3NzU1MDQ4MjE4MRMAAAAtMTYwLjU2NzMwNTkzNTcyNTA3EgAAADExODUuNjkwODAxNjQwNjE4MhIAAAAzOTQuODQwMjI5MDE0NDYzNTUSAAAALTU0OS42NTQ5ODE0NjA3OTgxEgAAADEwMzguODc4MzIzMjc0OTU4MxIAAAAxNDU5LjUwODA4NjUwMjEwODkRAAAAMjE0LjM5MTYxMzAxMjg3NzkQAAAAODIxLjUzNTExMjI4ODk0MRIAAAAyMTk1Ljk4MTcxNDE4NTkwODU=', 'base64') as frequent_strings_sketch))

My 101 debug code

template<typename K, typename V, typename H, typename E, typename A>
reverse_purge_hash_map<K, V, H, E, A>::~reverse_purge_hash_map() {
  const uint32_t size = 1 << lg_cur_size_;
  std::cout << "Attempting to deallocate hashmap. Size: " << size << ". Num active: " << num_active_ << std::endl;
  if (num_active_ > 0) {
    for (uint32_t i = 0; i < size; i++) {
      if (is_active(i)) {
        std::cout << "Deacllocating active item index: " << i << std::endl;
        keys_[i].~K();
        std::cout << "Done deallocating index: " << i << std::endl;

        if (--num_active_ == 0) break;
      }
    }
  }
  std::cout << "Deallocating keys" << std::endl;

  if (keys_ != nullptr) {
    allocator_.deallocate(keys_, size);
  }
  std::cout << "Done with keys. Deallocating values" << std::endl;

  if (values_ != nullptr) {
    AllocV av(allocator_);
    av.deallocate(values_, size);
  }
  std::cout << "Done with values. Deallocating states" << std::endl;

  if (states_ != nullptr) {
    AllocU16 au16(allocator_);
    au16.deallocate(states_, size);
  }
  std::cout << "Done with states" << std::endl;
}
jmalkin commented 1 year ago

This is a very helpful issue report. We'll start looking into it.

jmalkin commented 1 year ago

Edited my comment refer to destructors, which is what i was describing, even though i said constuctor.

AlexanderSaydakov commented 1 year ago

Unrelated to the crash, I looked at the sketch and I saw floating-point numbers as strings there. I am a bit puzzled what the use case might be. What if a number is slightly off? Say by 1 count in 13th decimal place? For sketch it would be a different item altogether. Is this intentional?

AlexanderSaydakov commented 1 year ago

I could not reproduce the problem so far. I have PostgreSQL 14.3 on my MacBook, and your query works just fine.

andyndang commented 1 year ago

@AlexanderSaydakov I built the container directly from this master branch in Docker and execute the query against the container.

Regarding numbers in sketches, those are intentional. We just use this sketch in a generic sense so sometimes numbers get through

jmalkin commented 1 year ago

The current master in datasketches-cpp shouldn't even be compatible with the current datasketches-postgresql. We've changed template parameters for a number of sketches.

andyndang commented 1 year ago

@jmalkin yeah I didn't test it against the latest datasketches - too many incomptabile changes. However the code that triggered the segfault doesn't look like it's been changed (see the above comment). I suspect that this might happen in the latest ds branch as well and will look into creating a unit test for this

AlexanderSaydakov commented 1 year ago

I managed to reproduce the problem, and I think I know the cause. I will fix shortly.

AlexanderSaydakov commented 1 year ago

@andyndang If I am right, this is a small and subtle mistake of mine. Take a look at the pull request: just removing "std::" prefix in one place should fix this (I hope). Could you give this a try in your environment?

AlexanderSaydakov commented 1 year ago

@andyndang Did you have a chance to give this a go? I would like to close this issue if the problem is resolved. Thank you.

andyndang commented 1 year ago

@AlexanderSaydakov the fix works! Thank you! I'll close the issue