constantinpape / z5

Lightweight C++ and Python interface for datasets in zarr and N5 format
MIT License
108 stars 27 forks source link

Error while creating dataset using blosc compression (c++) #197

Closed KestutisMa closed 2 years ago

KestutisMa commented 2 years ago

When calling z5::createDataset(f, dsName, "float32", shape, chunks, "blosc"); (without "blosc" - everything works) I got error:

terminate called after throwing an instance of 'std::out_of_range'
  what():  map::at

stacktrace shows it appears at: https://github.com/constantinpape/z5/blob/2180c9d6bd45a7c414bb353ecb3312e70f974491/include/z5/types/types.hxx#L257

Debugging doesn't show meaningful variables values, but I am not very experienced at debugging templates.

Could you confirm that z5 is working with blosc for you?

Arch Linux, gcc 11

constantinpape commented 2 years ago

This means that the codec is not available, which is a bit curious. Are you building z5 from source? I could push a commit to improve the error message that would clarify a lot what's going on here. (Should have done this a long time ago.)

KestutisMa commented 2 years ago

I could push a commit to improve the error message that would clarify a lot what's going on here.

That would be great. No, I am building using cmake and conda as c++ libraries package manager: z5py, blosc my main.cpp:

#define WITH_ZLIB

CmakeLists.txt:

list(APPEND CMAKE_PREFIX_PATH "/home/lab/.conda/envs/py39/")
...
#find_package(BLOSC REQUIRED) # 
link_directories(/home/lab/.conda/envs/py39/lib) # need to add manually, as blosc in conda doesn't contain cmake find_package script
target_link_libraries(myProgram PRIVATE blosc ...

P.S. Compression is working fine with zlib.

There is also https://github.com/Blosc/c-blosc2, which states that:

C-Blosc2 is the new major version of C-Blosc, and tries hard to be backward compatible with both the C-Blosc1 API and its in-memory format.

constantinpape commented 2 years ago

@KestutisMa I had another look at this, and I think that there is indeed an issue in z5 when calling createDataset with blosc compression and no further compression options. (This works for other compressors like zlib, which are simpler than blosc).

I don't have time to fully debug this now, but there should be a simple work-around: just call createDataset with compression options:

z5::types::CompressionOptions copts;
z5::types::defaultCompressionOptions("blosc", copts, true);
z5::createDataset(f, dsName, "float32", shape, chunks, "blosc", copts);

Note that this should also work for any other compressor.

Let me know if this fixes the issue.

KestutisMa commented 2 years ago

I still got same error using copts. By the way, z5::types::defaultCompressionOptions(z5::types::Compressor::blosc, copts, true); first argument type is z5::types::Compressor not string.

constantinpape commented 2 years ago

I just released 2.0.13 that will fix these issues. (Will probably be on conda-forge tomorrow). Here's an updated test that makes sure that it works with blosc: https://github.com/constantinpape/z5/blob/master/src/test/test_dataset.cxx#L231-L233 (please be aware that this only works for data in the zarr format.)

KestutisMa commented 2 years ago

It's working now, thank you! Aditional lines were not needed:

        Metadata fMeta(true);
        filesystem::writeMetadata(fileHandle_, fMeta);

Previously I was using 2.0.11 version.

I see that default blosc codec is lz4: https://github.com/constantinpape/z5/blob/1ac8fc9f05c64cb2fe2179df897337d060b2aa90/src/python/module/z5py/dataset.py#L68

Are other codecs also supported? Looks like benchmarks shows that BloscLZ is faster than memcpy in some cases https://www.blosc.org/pages/synthetic-benchmarks/

constantinpape commented 2 years ago

Aditional lines were not needed:

        Metadata fMeta(true);
        filesystem::writeMetadata(fileHandle_, fMeta);

Yes, this is only necessary if you don't have an existing zarr file yet.

Are other codecs also supported? Looks like benchmarks shows that BloscLZ is faster than memcpy in some cases https://www.blosc.org/pages/synthetic-benchmarks/

This depends on how you build blosc I guess. This parameter is simply passed on to the blosc API: https://github.com/constantinpape/z5/blob/master/include/z5/compression/blosc_compressor.hxx#L32 You can control this with codec in the CompressionOptions, e.g.

z5::types::CompressionOptions copts;
copts["codec"] = std::string("some-compressor-name");

and pass this to createDataset. (Note that the std::string is important, otherwise std::variant may cast this to a boolean for some reason...).

KestutisMa commented 2 years ago

It is working, thanks again!