Closed andyndang closed 2 years ago
What version are you testing?
The latest version from master.
I've managed to narrow it down to this line: https://github.com/apache/datasketches-postgresql/blob/master/src/frequent_strings_sketch_c_adapter.cpp#L93
Seems that frequent_items_sketch doesn't have an explicit destructor and that might be why it works some of the time but fails for some other?
Looks like we don't have an explicit destructor in FI sketch: https://github.com/apache/datasketches-cpp/blob/master/fi/include/frequent_items_sketch.hpp
Right, because the default destructor should be sufficient since there are no pointers in the class. The hash map does have a destructor, and that's automatically called when it goes out of scope.
The master branch of the C++ library has unreleased, API-breaking changes. Please try the latest release tag.
Would love to but it looks like the code in this repo isn't compatible with the latest datasketches-cpp branch and I'm not experienced enough with C++ to debug the errors. Seems to require some update on this library as well.
However it doesn't look like the deallocator for reverse_hash_map has changed since 3.4: https://github.com/apache/datasketches-cpp/blob/3.4.0/fi/include/reverse_purge_hash_map_impl.hpp#L98 versus: https://github.com/apache/datasketches-cpp/blob/master/fi/include/reverse_purge_hash_map_impl.hpp#L98
I threw in a bunch of printf statements into the deallocator (see code below) and it looks like it failed to deallocate the first entry and encountered exception with this sketch
2022-10-26 07:34:34.485 UTC [1] LOG: database system is ready to accept connections
Attempting to deallocate hashmap. Size: 64. Num active: 44
Deacllocating active item index: 0
SQL
SELECT frequent_strings_sketch_result_no_false_negatives(cast(decode('BAEKBwYAAAAsAAAAAAAAAD4fAAAAAAAAUgAAAAAAAAABAAAAAAAAAAEAAAAAAAAAAQAAAAAAAAABAAAAAAAAAAEAAAAAAAAAAQAAAAAAAAABAAAAAAAAAAEAAAAAAAAAAQAAAAAAAAABAAAAAAAAAAEAAAAAAAAAAQAAAAAAAAABAAAAAAAAAAEAAAAAAAAAAQAAAAAAAAABAAAAAAAAAAEAAAAAAAAAAQAAAAAAAAABAAAAAAAAAAEAAAAAAAAAAQAAAAAAAAABAAAAAAAAAAEAAAAAAAAAAQAAAAAAAAABAAAAAAAAAAEAAAAAAAAAAQAAAAAAAAABAAAAAAAAAAEAAAAAAAAAAQAAAAAAAAABAAAAAAAAAAEAAAAAAAAAAQAAAAAAAAABAAAAAAAAAAEAAAAAAAAAAQAAAAAAAAABAAAAAAAAAAEAAAAAAAAAAQAAAAAAAAABAAAAAAAAAAEAAAAAAAAAAQAAAAAAAAABAAAAAAAAAAEAAAAAAAAAEgAAAC00MzguNzE3NDU5MjM1MDI2MREAAAAxNzA3LjIwOTEzMzYwODQ4MREAAAA2NTUuMDE0OTI4NzQ5MDk4MxAAAAA2Ny42NDM4MDI4MTAzMzM2EgAAAC01MzEuMTIyMzA3NTQ0MTg1OREAAAA4MzMuMzkwNzE3MTI3NjQxOBEAAAA2MTQuNzA1ODYyMjk0MDEyNxIAAAAxMTg4Ljc0ODkxNTA3NjM5MzQRAAAAMzI3LjA0MjI2NzM1NjQ4NjQRAAAAOTk1Ljk5NTgwNjgzNjIwODERAAAANzA1Ljg2NDEwMTE2NzkyNTQSAAAAMTQ0My4xMTQzOTUyMjA4Mjk1EgAAADMzMC4wODM4NDgyOTAxMzA5NhAAAAA3NzkuODExMDc3MDI3NTU5EgAAADE0MjguMzA2MDM0NTY5Njc2MxIAAAAtNjc5LjQ5Mzc4NTcwNDk2NTURAAAAMTIwOS4zMzY0NTU4MDc5MDMSAAAAMTI1MS4yNDAwNDIzOTcyODk1EgAAADQzOS4xNzg0MDUyOTU2NzI0NhMAAAAtMTc1Ljg2OTAwNTA5MDM2OTU0EgAAADEzNjguMjUzNDk3MzMyNjU3NBEAAAA3ODMuNTg4MDA4NzI2MjM3MxIAAAAxMDE1LjM5MjI5NDQ4NTE1NTYSAAAANDYuOTQyNTQ1MzU1NzI1NDE0EgAAAC0xOS4wODk0NjMzMTk4OTEwNhEAAAAzMDguMDE5Njc0OTY3MjYzNBAAAAAzOTIuNDA4NTAyNTE0MzcyEgAAAC0zMTMuMzgzMjM3MzIwMDI2NhEAAAAxMTUyLjMyNzc0NTc4NzA3OBIAAAAxMTY5LjU4MDMxOTA1ODcyNzYSAAAAMTI0MC44NTAxMzY0MTEzOTk1EgAAADE0NzUuMDA0MDA2MjY4MjQxNxEAAAAzMzUuMDI1MjQ2ODc5OTgzNxEAAAAxMTg2LjM0NjMyNDgyOTYxMxEAAAA5MTMuMzE3NzU1MDQ4MjE4MRMAAAAtMTYwLjU2NzMwNTkzNTcyNTA3EgAAADExODUuNjkwODAxNjQwNjE4MhIAAAAzOTQuODQwMjI5MDE0NDYzNTUSAAAALTU0OS42NTQ5ODE0NjA3OTgxEgAAADEwMzguODc4MzIzMjc0OTU4MxIAAAAxNDU5LjUwODA4NjUwMjEwODkRAAAAMjE0LjM5MTYxMzAxMjg3NzkQAAAAODIxLjUzNTExMjI4ODk0MRIAAAAyMTk1Ljk4MTcxNDE4NTkwODU=', 'base64') as frequent_strings_sketch))
My 101 debug code
template<typename K, typename V, typename H, typename E, typename A>
reverse_purge_hash_map<K, V, H, E, A>::~reverse_purge_hash_map() {
const uint32_t size = 1 << lg_cur_size_;
std::cout << "Attempting to deallocate hashmap. Size: " << size << ". Num active: " << num_active_ << std::endl;
if (num_active_ > 0) {
for (uint32_t i = 0; i < size; i++) {
if (is_active(i)) {
std::cout << "Deacllocating active item index: " << i << std::endl;
keys_[i].~K();
std::cout << "Done deallocating index: " << i << std::endl;
if (--num_active_ == 0) break;
}
}
}
std::cout << "Deallocating keys" << std::endl;
if (keys_ != nullptr) {
allocator_.deallocate(keys_, size);
}
std::cout << "Done with keys. Deallocating values" << std::endl;
if (values_ != nullptr) {
AllocV av(allocator_);
av.deallocate(values_, size);
}
std::cout << "Done with values. Deallocating states" << std::endl;
if (states_ != nullptr) {
AllocU16 au16(allocator_);
au16.deallocate(states_, size);
}
std::cout << "Done with states" << std::endl;
}
This is a very helpful issue report. We'll start looking into it.
Edited my comment refer to destructors, which is what i was describing, even though i said constuctor.
Unrelated to the crash, I looked at the sketch and I saw floating-point numbers as strings there. I am a bit puzzled what the use case might be. What if a number is slightly off? Say by 1 count in 13th decimal place? For sketch it would be a different item altogether. Is this intentional?
I could not reproduce the problem so far. I have PostgreSQL 14.3 on my MacBook, and your query works just fine.
@AlexanderSaydakov I built the container directly from this master branch in Docker and execute the query against the container.
Regarding numbers in sketches, those are intentional. We just use this sketch in a generic sense so sometimes numbers get through
The current master in datasketches-cpp shouldn't even be compatible with the current datasketches-postgresql. We've changed template parameters for a number of sketches.
@jmalkin yeah I didn't test it against the latest datasketches - too many incomptabile changes. However the code that triggered the segfault doesn't look like it's been changed (see the above comment). I suspect that this might happen in the latest ds branch as well and will look into creating a unit test for this
I managed to reproduce the problem, and I think I know the cause. I will fix shortly.
@andyndang If I am right, this is a small and subtle mistake of mine. Take a look at the pull request: just removing "std::" prefix in one place should fix this (I hope). Could you give this a try in your environment?
@andyndang Did you have a chance to give this a go? I would like to close this issue if the problem is resolved. Thank you.
@AlexanderSaydakov the fix works! Thank you! I'll close the issue
Hi,
We are testing the postgres extension and came across the issue of segfaulting when querying the fi strings sketches. We built these sketches in other environments (Java/Python) and passed them into postgres.
It doesn't occur with all the sketches, here's an example that would trigger the exception
The above sketch (base64 encoded) can be parsed fine with datasketches Python 3.5.0 from Pypi.
Postgers logs doesn't show much information either.