fingltd / 4mc

4mc - splittable lz4 and zstd in hadoop/spark/flink
Other
108 stars 36 forks source link

Update lz4 and zstd version? #46

Closed mareksimunek closed 4 years ago

mareksimunek commented 4 years ago

Hi, I was looking for some backported codecs to hadoop 2.7 and your amazing work just did what I was looking for.

I noticed project versions are: zstd: 1.0.1 (4 years old) lz4: 1.3.0 (6 years old)

Would it be please possible to upgrade it?

I would help, but I am no expert in JNI.

mareksimunek commented 4 years ago

Hi @carlomedas I know you are probably focused on different projects. Is there any chance that you will give some hint how to update?

carlomedas commented 4 years ago

I'll see if we can make it become a Fing company github project so we can maintain officially here, as I'm kind of out of personal time for this. We've been using this in our big-data architecture so it makes a lot of sense.

The problem in updating is not putting latest libraries, which is very fast, unless interface headers changed signatures, but more problematic point is to build the library on all platforms before releasing a new version, as I don't have any more all build VM's. Let me propose it as Fing project and let you know.

Last point: even by updating libraries, binary compressed data is not going to change, so you can start using it today and when there is updated version you can switch seamlessly.

mareksimunek commented 4 years ago

Thank you very much for effort.

The problem in updating is not putting latest libraries, which is very fast, unless interface headers changed signatures, but more problematic point is to build the library on all platforms before releasing a new version, as I don't have any more all build VM's.

That's what I am afraid of, that interfaces changed after that many years. As far as platforms support: I will be little selfish and say linux is all I need.

Last point: even by updating libraries, binary compressed data is not going to change, so you can start using it today and when there is updated version you can switch seamlessly.

I tried ZSTD and it needs lot more memory (20-30%) than other compressions (GZIP, LZ4, SNAPPY) so I am not sure if its nature of algorithm or it's just old version where they didn't focus on memory consumption. But compression ratio is the best.

carlomedas commented 4 years ago

Let me confirm we are progressing on that. 4mc has been transferred to Fing company and we will make sure to keep it up to date.

mareksimunek commented 4 years ago

Once again, thanks. :) looking forward for updates.

carlomedas commented 4 years ago

Stay tuned, new release should be coming this week.

carlomedas commented 4 years ago

Done by @noodlesbad !