Optimize ONNX quantization on specific layer for baseline model

NeuroDesk / cvpr-sam-on-laptop-2024

Apache License 2.0

4 stars 1 forks source link

Optimize ONNX quantization on specific layer for baseline model #2

Closed iishiishii closed 6 months ago

iishiishii commented 7 months ago

Goal: Reduce model size and runtime while keeping the accuracy

TODO:

export to ONNX: accuracy remains the same and runtime is slightly better for 2D but a bit worse for 3D images
optimize ONNX model by combining some OPs: online graph optimization doesn't improve runtime but offline optimization does

File Name	Original ONNX runtime (s)	Offline Opt ONNX decoder runtime (s)	Offline Opt ONNX encoder + decoder runtime (s)
2DBox_CXR_demo.npz	1.0211	1.0035	0.9967
2DBox_Dermoscopy_demo.npz	2.0668	2.0804	2.0970
2DBox_Endoscopy_demo.npz	1.9663	1.9134	1.8685
2DBox_Fundus_demo.npz	2.9207	2.8813	2.9211
2DBox_Mammography_demo.npz	4.4208	4.4089	4.3516
2DBox_Microscope_demo.npz	5.8922	5.7807	5.7084
2DBox_OCT_demo.npz	2.2883	2.1902	2.2225
2DBox_US_demo.npz	1.3955	1.3359	1.3974
3DBox_CT_demo.npz	7.3724	7.0056	6.5934
3DBox_MR_demo.npz	45.2152	42.6667	43.1211

iishiishii commented 7 months ago

The process was killed at large 3D bounding box in validation set.

oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=docker-24ad385f8a96bf9d5e617e51c92441a093ce94f97652613df1397c7b5d7965ce.scope,mems_allowed=0,oom_memcg=/system.slice/docker-24ad385f8a96bf9d5e617e51c92441a093ce94f97652613df1397c7b5d7965ce.scope,task_memcg=/system.slice/docker-24ad385f8a96bf9d5e617e51c92441a093ce94f97652613df1397c7b5d7965ce.scope,task=python3,pid=87005,uid=999 [Wed Apr 3 15:07:55 2024] Memory cgroup out of memory: Killed process 87005 (python3) total-vm:13921856kB, anon-rss:8075896kB, file-rss:258672kB, shmem-rss:4kB, UID:999 pgtables:21360kB oom_score_adj:0

iishiishii commented 7 months ago

Steps to reproduce:

Follow environment setup instruction in README
Mount data from RDM or download it
Run inference python CVPR24_LiteMedSamOnnx_infer.py -i ~/rdm/data/datasets/validation/imgs/ -o validation/segs or using Docker docker container run -m 8G --name litemedsam --rm -v ~/rdm/data/datasets/validation/imgs/:/workspace/inputs/ -v $PWD/validation/litemedsam-seg/:/workspace/outputs/ litemedsam:latest /bin/bash -c "sh predict.sh" @nanthan987 Let me know if it doesn't reproduce from your side

iishiishii commented 6 months ago

Pytorch profiler

iishiishii commented 6 months ago

The memory peak due to show_mask function. After excluding it and other unused packages, the memory usage is ~1GB