byronknoll / cmix

cmix is a lossless data compression program aimed at optimizing compression ratio at the cost of high CPU/memory usage.
http://www.byronknoll.com/cmix.html
GNU General Public License v3.0
600 stars 44 forks source link

Try RWKV (better than LSTM for Language Modeling) #58

Open BlinkDL opened 10 months ago

BlinkDL commented 10 months ago

Hi Byron,

Would you like to try RWKV for better compression?

Some experiments by Fabrice: https://bellard.org/ts_server/ts_zip.html

You can do online training of RWKV and it learns fast.

Bo

byronknoll commented 10 months ago

That is a nice idea. If I have time, I might try it out in tensorflow-compress: https://github.com/byronknoll/tensorflow-compress

xXWarMachineRoXx commented 1 month ago

Isn't the compression ratio 7.382 for enwiki9 with RWKV?

byronknoll commented 1 month ago

Fabrice Bellard tried out a pre-trained RWKV model on enwik9: https://bellard.org/ts_zip/

For usage in tensorflow-compress, the model would not be pre-trained (i.e. it would be worse compression rate, but smaller decompressor size).