LDeakin / zarrs_tools

Various tools for creating and manipulating Zarr v3 data with the zarrs rust crate
Apache License 2.0
15 stars 3 forks source link

memory allocation failure #9

Closed joshmoore closed 13 hours ago

joshmoore commented 3 months ago

When attempting a conversion of a (100, 3, 2000, 2000, 2000) hypervolume , I run into a memory allocation failure:

 ~/.cargo/bin/zarrs_reencode big.zarr reencode.zarr --shard-shape 1,3,2000,2000,2000 --chunk-shape 1,1,250,250,250
[00:01:23/00:01:23] ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 0/100 (0%)  rw:0.00/0.00 p:0.00                                                                                                                                                       memory allocation of 192000000000 bytes failed
memory allocation of 192000000000 bytes failed
Aborted (core dumped)
LDeakin commented 3 months ago

That is a massive chunk and shard shape. Have you successfully used shards that large with any other Zarr libraries/tools? A (1,3,2000,2000,2000) shard with a 64-bit data type is 192GB (uncompressed)! Do you really want to write shards (files) potentially that large?

zarrs does not currently support incrementally writing shards, but I can consider supporting that in the future for requests like this.

Chunk shape

The "chunk shape" is the read granularity. You want to choose a chunk shape on the order of kilobytes to low megabytes for efficient visualisation in tools like neuroglancer. A (1,1,250,250,250) chunk size with a 64-bit data type is 125MB. I'd suggest something like (1,1,50,50,50).

Shard Shape

The "shard shape" is the write granularity. I recommend choosing a shard shape suited to parallel processing/writing. For a 64-bit data type, a (1,1,500,500,500) shard would be 1GB in memory and $\lessapprox$ 1GB on disk.

joshmoore commented 3 months ago

That is a massive chunk and shard shape. Have you successfully used shards that large with any other Zarr libraries/tools?

No :smile: I'm stress-testing everything at the moment. Sorry for not mentioning earlier. I've got my head down and was trying to capture the output before that remote session ended. It definitely wasn't a complaint. zarrs is still top of the leader board ;)

zarrs does not currently support incrementally writing shards, but I can consider supporting that in the future for requests like this.

:+1: