LBL-EESA / TECA

TECA, theToolkit for Extreme Climate Analysis, contains a collection of climate anlysis algorithms targetted at extreme event detection and analysis.
Other
54 stars 21 forks source link

deeplab ar detector model size #358

Open burlen opened 4 years ago

burlen commented 4 years ago

The test suite keeps failing trying to download these files. The size of these things is going to impact us at scale. The size of the model is going to make deployment by PyPi problematic and we may have to disable the deeplab stuff from our PyPi package.

There are some techniques that can reduce the model size https://arxiv.org/abs/1510.00149 Han, Song, Huizi Mao, and William J. Dally. “Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding.” arXiv preprint arXiv:1510.00149 (2015).

Jia, Haipeng, et al. “DropPruning for Model Compression.” arXiv preprint arXiv:1812.02035 (2018).

burlen commented 4 years ago

@taobrienlbl is there any expertise on the team we can engage about this issue?

taobrienlbl commented 4 years ago

@burlen, no, not on the team. There might be more broadly at NERSC, but I'm not sure.

I also realized yesterday that pytorch is going to be problematic for the Shifter implementation that @elbashandy is working on: the pytorch installation itself is just under 2GB, so the Docker image has to be enormous. That also might be a problem at scale: we'll have to find out.

elbashandy commented 4 years ago

I don't think shifter will be a problem as the docker image gets downloaded once at nersc using shifterimg pull and this image is used when we submit a job. Something like this:

#!/bin/bash
#SBATCH --image=docker:image_name:latest
#SBATCH --qos=regular
#SBATCH -N 2
#SBATCH -C haswell

srun -n 64 shifter python3 ~/hello.py

Docker images are considered too much from NERSC's perspective if they are larger than >20GB as mentioned here:

https://docs.nersc.gov/programming/shifter/how-to-use/

taobrienlbl commented 4 years ago

I see what you're saying, and I probably should have phrased it as 'could be problematic'. The potential issue that comes to mind is the amount of memory the image takes up on each node vs the amount of memory the TECA algorithm needs for processing.

elbashandy commented 4 years ago

I see.. that makes sense

elbashandy commented 4 years ago

@burlen Thanks for sharing the papers, they are great! The 'deep-compression' paper is doing 3 stage compression. Maybe we can try doing the first stage 'Pruning' using torch's own pruning feature

I will research this further

burlen commented 4 years ago

cool. I'm glad you are interested in learning more. all I can say at this point about the citations are that they indicate that there are some options out there. we'd need to know more before investing a lot of effort.

we should contact Prabhat.

burlen commented 4 years ago

see also #317

burlen commented 4 years ago

in a conversation with Ankur he says newer revisions use a different base model (mobile net?) that is smaller. He expressed interest in helping get the latest developments into teca. We need to sync up with him before proceeding.