catid / leopard

Leopard-RS : O(N Log N) MDS Reed-Solomon Block Erasure Code for Large Data
BSD 3-Clause "New" or "Revised" License
137 stars 24 forks source link

Progressive/incremental encoding ? #22

Open 1f604 opened 1 year ago

1f604 commented 1 year ago

Hi, is it possible to do progressive/incremental encoding with this? It would significantly reduce the memory usage if we could feed in one block of the input file at a time instead of having to keep the whole file in memory.

malaire commented 1 year ago

Instead of feeding one block at a time as you asked, you can feed a portion of all blocks at a time.

Encoding is done 64 bytes [from all blocks] at a time. So you can split your blocks into smaller portions (multiple of 64 bytes) and encode them separately.

For example you could first encode first 4 kB of each block, then next 4 kB of each block, and so on. With maximum of 65536 blocks this would require only 4 kB * 65536 = 256 MB of memory for input file of any size (plus overhead, I don't remember how much overhead this algorithm has.)

With 128 kB portions instead you would require at most 128 kB * 65536 = 8 GB of memory (plus overhead), and so on.

This approach should probably be combined with fast SSD drive, since you will be doing a lot of reads/writes all over the files.