CakeML / cakeml

CakeML: A Verified Implementation of ML
https://cakeml.org
Other
944 stars 81 forks source link

Formalise DEFLATE and add implementation to examples #806

Open myreen opened 3 years ago

myreen commented 3 years ago

Deflate is a lossless data compression file format that zip and gzip are based on.

The deflate algorithm would be a nice formalisation and verification exercise in HOL. The result of the formalisation effort could be used to construct a verified gzip-like executable using CakeML.

The same compression and decompression algorithm might come in handy inside the CakeML compiler, once the compiler stores bulky constant data structures as read-only data. Such bulky constants could, for example, be the AST for the basis library or the bignum library.

This project is likely to fit the scope of an internship or MSc thesis.

konrad-slind commented 3 years ago

Related work:

https://hal.archives-ouvertes.fr/hal-02149909/document

On Mon, Nov 16, 2020 at 8:07 AM myreen notifications@github.com wrote:

Deflate https://en.wikipedia.org/wiki/DEFLATE is a lossless data compression file format that zip https://en.wikipedia.org/wiki/ZIP_(file_format) and gzip https://en.wikipedia.org/wiki/Gzip are based on.

The deflate algorithm https://tools.ietf.org/html/rfc1951 would be a nice formalisation and verification exercise in HOL. The result of the formalisation effort could be used to construct a verified gzip-like executable using CakeML.

The same compression and decompression algorithm might come in handy inside the CakeML compiler, once the compiler stores bulky constant data structures as read-only data. Such bulky constants could, for example, be the AST for the basis library or the bignum library.

This project is likely to fit the scope of an internship or MSc thesis.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/CakeML/cakeml/issues/806, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIOAD7JSIL7TJM64H4F4YTSQEWZLANCNFSM4TXGSYBA .

myreen commented 3 years ago

Work on this could advance in stages:

  1. Implement the compression and decompression algorithms as functions in HOL
  2. Make it easy to prove ∀s. decompress (compress s) = s by making compress test whether decompress works for its output: if not, then it can return its input tagged as uncompressed.
  3. Extensively test that compress won't produce uncompressed data.
  4. Prove that the compress function always produces properly compressed data.
mn200 commented 3 years ago

Something very similar to the calculation of a Huffman tree given the weights for the alphabet is done in HOL’s examples/computability/kolmog/kraft_ineqScript.sml. (Weights are specified by giving the desired length of the code for each “symbol”.). It should probably be possible to refactor so as to extract the code for the tree-building etc.

lxndrcx commented 3 years ago

I'm going to give this a go for my Honours project.

lxndrcx commented 3 years ago

slowly progressing here