OUPL / MLCert

Certified Machine Learning
40 stars 4 forks source link

Cannot replicate extraction to MLCert #8

Open wenkokke opened 4 years ago

wenkokke commented 4 years ago

I'm currently trying to replicate the extraction from an EMNIST network to MLCert. This is as far as I've gotten trying to get the code in NNCert to work, patching things up as I go:

  1. Install Miniconda
  2. conda create --name NNCert python=3.5
  3. conda activate NNCert
  4. pip install tensorflow==1.6.1
  5. Download EMNIST
  6. Patch MLCert/NNCert/tf/extract_emnist.py:
    • Line 6: Change emnist_load_data() to emnist_load_data(<path/to/emnist>)
    • Note: The script will try to load from <path/to/emnist>/emnist
  7. python extract_emnist.py (from MLCert/NNCert/tf)
  8. Patch MLCert/NNCert/tf/dataset_params.py:
    • Restore definition make_dataset:
      def make_dataset(images, labels):
      return DataSet(images, labels, reshape=False, dtype=tf.uint8)
  9. Patch MLCert/NNCert/tf/pca.py:
    • Line 9: Change _, _, _, _, _, load_data = .. to _, load_data = ..
  10. python pca.py (from MLCert/NNCert/tf)
  11. Rename generated files <path/to/emnist>/emnist/{train,test,validation}_reduced.pkl to <path/to/emnist>/emnist/{train,test,validation}_pca.pkl
  12. Patch MLCert/NNCert/tf/Makefile:
    • Line 7: Change train.py to main.py
    • Line 10: Change eval.py to main.py
  13. Patch MLCert/NNCert/tf/main.py:
    • Line 41: Change hidden_sizes = map(..) to hidden_sizes = list(map(..))
  14. make (from MLCert/NNCert/tf)
  15. :sob::sob::sob:

    FileNotFoundError: [Errno 2] No such file or directory: 'models/default/params.pkl.gz'
    make: *** [train] Error 1
gstew5 commented 4 years ago

Alex, do you think you could take a look?

@wenkokke, in the meantime, you might try the (currently undocumented) CLI @bagnalla built, by doing:

> cd NNCert
> python3 cli.py

The CLI's dependencies are listed in NNCert/cli_deps.txt (though not with version numbers -- perhaps @bagnalla can add).

wenkokke commented 4 years ago

Using cli.py works for me, using the following steps:

  1. Install Miniconda
  2. conda create --name NNCert python=3.5
  3. conda activate NNCert
  4. pip install tensorflow==1.6.1
  5. pip install sklearn
  6. pip instal gitpython
  7. pip install inquirer
  8. python cli.py (from MLCert/NNCert/)

It was initially rather unclear to me that the output of cli.py was a asking me to make a choice, rather than giving feedback on the progress.

Extracting model...
Makefile:17: Makefile.conf: No such file or directory
COQDEP VFILES
coq_makefile -R ../ MLCert -arg '-w none' net.v out.v bitnet.v kernel.v print.v config.v -o Makefile
Warning: ../ (used in -R or -Q) is not a subdirectory of the current directory

COQC net.v
COQC out.v
File "./out.v", line 8, characters 19-25:
Error: Unable to locate library bitnet.

make[1]: *** [out.vo] Error 1
make: *** [all] Error 2
File "./empiricalloss.v", line 14, characters 19-25:
Error: Unable to locate library bitnet.
wenkokke commented 4 years ago

Extracting a Coq model for EMNIST, with 10.000 nodes in the hidden layer, does not seem to be practical. Extraction ran for two days, and memory usage went up to 42GB, before I cancelled it. If you have had the same experience, @ejgallego has offered to help you optimise the generated Coq code, and has offered some suggestions:

Thanks for the details, after a quick peek to MLCert I think it should be possible to optimize memory consumption one order of magnitude. I'd be happy to help with it, of course if you folks think it'd be worth the effort.

Main things to try would be: use a native float and vector repesentation, profile Coq extraction, see where it is not being eager enough.

I'm guessing that extraction might not be your priority, however, as the networks imported to Coq can already be run using the TensorFlow architecture, and extracting the verified code to Haskell may not offer any additional benefits.

gstew5 commented 4 years ago

Thanks for your help, Wen! It would certainly be interesting to do the port to native vectors and floats (Coq 8.11), as suggested by @ejgallego. We're not actively working on MLCert at the moment, however, so I can't promise this will be done soon (or at all).

ejgallego commented 4 years ago

Hi folks, indeed I believe it should possible to improve the efficiency of extraction quite a bit; just ping us or open an issue at https://github.com/coq/coq/issues and we'll be happy to try to help.