Blosc / bloscpack

Command line interface to and serialization format for Blosc
BSD 3-Clause "New" or "Revised" License
122 stars 27 forks source link

Allow a better calculation for chunksize that is actually divisible by typesize #52

Closed FrancescAlted closed 8 years ago

FrancescAlted commented 8 years ago

For example, when trying to compress an image file with 24bit depth:

$ ll 24bit.bpt 
-rwx------ 1 faltet faltet 1377618 jul 26 14:15 24bit.bpt*

$ blpk -f c -t 3 24bit.bpt 24bit-shuffle.blp
Traceback (most recent call last):
  File "/home/faltet/miniconda/bin/blpk", line 11, in <module>
    sys.exit(main())
  File "/home/faltet/miniconda/lib/python2.7/site-packages/bloscpack/cli.py", line 457, in main
    metadata_args=MetadataArgs())
  File "/home/faltet/miniconda/lib/python2.7/site-packages/bloscpack/file_io.py", line 465, in pack_file
    metadata_args=metadata_args)
  File "/home/faltet/miniconda/lib/python2.7/site-packages/bloscpack/abstract_io.py", line 127, in pack
    (double_pretty_size(chunk_size), blosc_args.typesize)
bloscpack.exceptions.ChunkSizeTypeSizeMismatch: chunk_size: '1.0M (1048576B)' is not divisible by typesize: '3'

whereas if we help bloscpack passing a chunksize (via -z):

$ blpk -f c -t 3 -z 1377618 24bit.bpt 24bit-shuffle.blp

$ ll 24bit-shuffle.blp
-rw-rw-r-- 1 faltet faltet 40976 jul 27 17:10 24bit-shuffle.blp

Adding a more adaptative chunksize calculation would save the user to have to pass the chunksize manually.

This example is based on: https://github.com/Cyan4973/zstd/issues/256

esc commented 8 years ago

The chunk-size and typesizes are hardcoded defaults that don't depend on the size if the input. If the user changes those values, it is his or her responsibility to "get it right"? But then again, if you want this feature you should implement it, I guess.

FrancescAlted commented 8 years ago

Indeed. This is just a friendly reminder about the convenience for this.