ggerganov / ggml

Tensor library for machine learning
MIT License
10.96k stars 1.01k forks source link

Suggestion: Implement Text Completion and Compression using GPT-2 #1

Closed trholding closed 1 month ago

trholding commented 1 year ago

Fabrice Bellard's https://bellard.org/libnc/gpt2tc.html gpt2tc is an excellent opensource tool to compress text to very tiny sizes with GPT-2 however it uses https://bellard.org/libnc/ https://bellard.org/libnc/libnc.html which is distributed only as a binary blob / lib. Maybe it could be possible to re-implement gpt2tc with ggml. Would be awesome.

gpt2c performs excellently on very small texts and produces the smallest file sizes.

Another compressor that is based on libnc is nncp and by the same author, it currently holds the world record in text compression: http://www.mattmahoney.net/dc/text.html

nncp

[nncp](https://bellard.org/nncp/) is a free, experimental file compressor by Fabrice Bellard, released May 8, 2019. It uses a neural network model with dictionary preprocessing described in the paper [Lossless Data Compression with Neural Networks](https://bellard.org/nncp/nncp.pdf). Compression of enwik9 uses the options:
./preprocess c out.words enwik9 out.pre 16384 512
./nncp -n_layer 7 -hidden_size 384 -n_embed_out 5 -n_symb 16388 -full_connect 1 -lr 6e-3 c out.pre out.bin
Version 2019-11-16 was released Nov. 16, 2019. It was run in 8 threads.

Version 2 was released Jan. 3, 2021. It uses a [transformer](https://en.wikipedia.org/wiki/Transformer_(machine_learning_model)) architecture, a recurrent neural network with attention mechanism to allow parallelism. The algorithm is described briefly [here](https://bellard.org/nncp/nncp_v2.pdf). It uses the same dictionary preprocessing as earlier versions. It was tested with an [Intel Xeon E3-1230 v6](https://ark.intel.com/content/www/us/en/ark/products/97474/intel-xeon-processor-e3-1230-v6-8m-cache-3-50-ghz.html) at 3.5 GHz and a [Geforce RTX 3090 GPU](https://www.nvidia.com/en-us/geforce/graphics-cards/30-series/rtx-3090/) with 10,496 Cuda cores and 24 GB RAM.

nncp v2.1 was released Feb. 6, 2021. It is the same code as v2 except for a larger model and slightly different hyperparameters.

nncp v3 was released Apr. 24, 2021. This new version is coded in C and supports recent NVIDIA GPUs. It is much faster (3x) due to algorithmic improvements and requires less memory. The Transformer model is similar (199M parameters) but the hyperparameters have been tuned.

nncp v3.1 was released June 1, 2021.

            Compression     Compressed size      Decompresser  Total size   Time (ns/byte)
Program       Options      enwik8      enwik9     size (zip)   enwik9+prog  Comp   Decomp   Mem  Alg Notes
-------       -------    ----------  -----------  -----------  -----------  ------ ------  ----- --- -----
nncp 2019-05-08          16,791,077  125,623,896    161,133 xd 125,785,029  420168 602409   2040 LSTM 84
nncp 2019-11-16          16,292,774  119,167,224    238,452 xd 119,405,676  826048 1156467  5360 LSTM 84
nncp v2                  15,600,675  114,317,255     99,671 xd 114,317,255  308645 313468  17000 Transformer 88     
nncp v2.1                15,020,691  112,219,309    100,046 xd 112,319,355  508332 515401  23000 Transformer 88
nncp v3                  15,206,966  110,034,293    197,491 xd 110,231,784  161812 158982   6000 Transformer 88
nncp v3.1                14,969,569  108,378,032    201,620 xd 108,579,652  212766 210970   6000 Transformer 88
ggerganov commented 1 year ago

I am familiar with libnc and gpt2c - it partially inspired me to work on ggml.

Currently, ggml is missing implementation for the back-propagation of some of the operators. I'll probably add this some time in the future.

The main problem is making the results reproducible across different CPUs and number of threads. This is something I have neglected in the current implementation and it will require some rework to get it right.

Finally, you would probably need to run this on the GPU when training. So it would be necessary to port the implementation to some GPU framework.

Overall, it is not a simple task to achieve what Fabrice has done 😄

trholding commented 1 month ago

Closing as it is only possible in the future or not relevant now.