Open burlen opened 4 years ago
@taobrienlbl is there any expertise on the team we can engage about this issue?
@burlen, no, not on the team. There might be more broadly at NERSC, but I'm not sure.
I also realized yesterday that pytorch is going to be problematic for the Shifter implementation that @elbashandy is working on: the pytorch installation itself is just under 2GB, so the Docker image has to be enormous. That also might be a problem at scale: we'll have to find out.
I don't think shifter will be a problem as the docker image gets downloaded once at nersc using shifterimg pull
and this image is used when we submit a job. Something like this:
#!/bin/bash
#SBATCH --image=docker:image_name:latest
#SBATCH --qos=regular
#SBATCH -N 2
#SBATCH -C haswell
srun -n 64 shifter python3 ~/hello.py
Docker images are considered too much from NERSC's perspective if they are larger than >20GB as mentioned here:
I see what you're saying, and I probably should have phrased it as 'could be problematic'. The potential issue that comes to mind is the amount of memory the image takes up on each node vs the amount of memory the TECA algorithm needs for processing.
I see.. that makes sense
@burlen Thanks for sharing the papers, they are great! The 'deep-compression' paper is doing 3 stage compression. Maybe we can try doing the first stage 'Pruning' using torch's own pruning feature
I will research this further
cool. I'm glad you are interested in learning more. all I can say at this point about the citations are that they indicate that there are some options out there. we'd need to know more before investing a lot of effort.
we should contact Prabhat.
see also #317
in a conversation with Ankur he says newer revisions use a different base model (mobile net?) that is smaller. He expressed interest in helping get the latest developments into teca. We need to sync up with him before proceeding.
The test suite keeps failing trying to download these files. The size of these things is going to impact us at scale. The size of the model is going to make deployment by PyPi problematic and we may have to disable the deeplab stuff from our PyPi package.
There are some techniques that can reduce the model size https://arxiv.org/abs/1510.00149 Han, Song, Huizi Mao, and William J. Dally. “Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding.” arXiv preprint arXiv:1510.00149 (2015).
Jia, Haipeng, et al. “DropPruning for Model Compression.” arXiv preprint arXiv:1812.02035 (2018).