Clearing data used for trace when minimizing existing Docker image

NPann commented 5 years ago

When minimizing the existing Docker image using neurodocker/reprounzip, is there a way to clear the input and output data used/created when generating the trace? By default these are part of the minimized image. Or does it require to clean the config.yml from repro-config before runing reprounzip?

Thanks!

kaczmarj commented 5 years ago

hi @NPann - you have a few options here.

modify the config.yml file inside the container and re-pack the traced files into the rpz file. as the neurodocker script works now, all the traced commands are run, and then the traced files are immediately packed into the rpz. you can enter the docker container, edit the config.yml file, and then repack with /tmp/reprozip-miniconda/bin/reprozip pack -d /tmp/neurodocker-reprozip-trace OUTFILE.
the .rpz file is a tar file (not compressed), which contains a compressed tarball of data and some metadata. you can extract the .rpz file and the data tarball, remove the files you don't want, then work backwards to recreate the .rpz file.
you can modify the dockerfile or docker image that is created by reprozunzip docker and then build the image using the --squash option. this will squash every layer into one, so you can't take advantage of docker's caching features.

i realize these options aren't necessarily trivial, so please let me know if you need help. i think option 2 would be best to try first.

NPann commented 5 years ago

Thanks! I will give a shot to option 2 which seems pretty easy to implement.

NPann commented 5 years ago

Finally got a chance to try this and #2 worked like a charm. Thanks!

ReproNim / neurodocker

Clearing data used for trace when minimizing existing Docker image #280