elixir-nx / hnswlib

Elixir binding for the hnswlib library.
Apache License 2.0
52 stars 9 forks source link

Segmentation fault errors when loading a bad index file #3

Closed andrew-combs closed 1 year ago

andrew-combs commented 1 year ago

HNSWLib.Index.load_index/3 will fail catastrophically when loading an index file that is either missing or malformed. Checking for a missing file is simple enough, but checking the validity of the saved index is more complicated.

A possible solution would be to frame the index on write such that the file contains both (a) header information to describe the index configuration (i.e. index size, dimensionality, distance, etc...). Provide checksums on frames of index data (i.e. write a checksum on every N-bytes). The checksum can be an optionally configurable parameter (set when saving the index), and stored in the header information.

When reading the index file, if any errors occurred this can be captured and sent back with an {:error, "bad index file"} tuple or something similar.

andrew-combs commented 1 year ago

2 resolves the failure that occurs when the file is missing. I'll try to find some time to work on the file validation piece a little later in the week.

cocoa-xu commented 1 year ago

Hi @andrew-combs, thanks for reporting this issue! I'm thinking about fixing this and #2 in NIF where these exceptions were thrown. They should be captured in the try-catch block though. I'll look into the code.

cocoa-xu commented 1 year ago

I'll try to find some time to work on the file validation piece a little later in the week.

That's definitely one way to do it :). But I'm worry about it might be time-consuming to work on file validation. I think that perhaps it's better to address these segmentation faults in the NIF world, and convert them into error-tuples.

WDYT?

andrew-combs commented 1 year ago

Yeah on second glance I realized that the write-functions for this are deeper in the 3rd party code as well. Sticking to "let it crash" and try-catch blocks converted to error tuples is probably a good idea :)

andrew-combs commented 1 year ago

Super excited about this work by the way! Thanks for making the efforts here 👏🏼 👏🏼 👏🏼

cocoa-xu commented 1 year ago

Oh, I found the root cause for this issue and #2.

I need to reset the index->val to nullptr after it throws an std::runtime_error, otherwise, index->val will be freed the second time in the destructor of the NIF resource (thus causing a double free issue, which led to the crash). I've updated the PR correspondingly!

/cc @josevalim

josevalim commented 1 year ago

Closing in favor of PR!