SeoLabCornell / torch2chip

Torch2Chip (MLSys, 2024)
MIT License
50 stars 4 forks source link

Torch2Chip (MLSys, 2024)

Torch2Chip is an End-to-end Deep Neural Network compression toolkit designed for prototype accelerator designer for algorithm-hardware co-design with high-degree of algorithm customization.

[Documentation]

:rocket: News & Update

:question: Why Torch2Chip?

The current "design-and-deploy" workflow faces under-explored challenges in the current hardware-algorithm co-design community due to some unavoidable flaws:

Figure1

From the perspectives of the hardware designers, the conflicts from the DL framework, SoTA algorithm, and current toolkits formulate the cumbersome and iterative designation workflow of chip prototyping, which is what Torch2Chip aim to resolve.

:star: What is Torch2Chip?

Torch2Chip is a toolkit that enables customized model compression (e.g., quantization) with full-stack observability for customized hardware designers. Starting from the user-customized compression algorithms, Torch2Chip perfectly meet the bottom-level needs for the customized AI hardware designers:

Figure1

Pre-trained Checkpoint

The model checkpoint is packed with the extracted tensors out of the basic operations (e.g., MatMul, Conv).

# Downloaded file
vit_small_lsq_adaround.tar.gz
|
--ptq
  |
  --[quantization method]
    |
    --checkpoint.pth.tar
        |
        --/t2c/
        |
        ----/tensors/
        ----t2c_model.pth.tar

The pre-trained checkpoint contains both model file and all the extracted tensors.

Vision model on ImageNet-1K (INT8)

Model Pre-trained By MinMax Channel+LSQ AdaRound+LSQ MinMaxChannel + Qdrop MinMaxChannel + LSQToken MinMaxChannel + MinMaxToken MinMaxChannel+QDropToken
ResNet-50 torchvision 76.16 (link) 76.12 (link) 76.18 (link) N/A N/A N/A
ResNet-34 torchvision 73.39 (link) 73.38 (link) 73.43 (link) N/A N/A N/A
ResNet-18 torchvision 69.84 (link) 69.80 (link) 69.76 (link) N/A N/A N/A
VGG16-BN torchvision 73.38 (link) 73.40 (link) 73.39 (link) N/A N/A N/A
MobileNet-V1 t2c (link) 71.21 (link) 69.87 (link) 71.13 (link) N/A N/A N/A
vit_tiny_patch16_224 timm 72.79 (link) 72.65 (link) 72.41 ([link]()) 72.49 (link) 73.27 (link) 73.00 (link)
vit_small_patch16_224 timm 81.05 (link) 81.02 (link) 80.89 ([link]()) 80.21 (link) 80.04 (link) 80.22 (link)
vit_base_patch16_224 timm 84.87 (link) 84.62 (link) 84.50 (link) 84.68 (link) 83.86 (link) 84.53 (link)
swin_tiny_patch4_window7_224 timm 80.83 (link) 80.76 (link) 80.71 (link) 80.30 (link) 80.74 (link) 80.10 (link)
swin_base_patch4_window7_224 timm 84.73 (link) 84.62 (link) 84.65 (link) 84.27 (link) 84.58 (link) 84.32 (link)
BERT-Base-SST2 HuggingFace 0.922 (link)

Vision model on ImageNet-1K (INT4)

[Coming soon!]

:notes: Authors

Members of Seo Lab @ Cornell University led by Professor Jae-sun Seo.

Jian Meng, Yuan Liao, Anupreetham, Ahmed Hasssan, Shixing Yu, Han-sok Suh, Xiaofeng Hu, and Jae-sun Seo.

:package: Cite Us

Publication: Torch2Chip: An End-to-end Customizable Deep Neural Network Compression and Deployment Toolkit for Prototype Hardware Accelerator Design (Meng et al., MLSys, 2024).

Acknowledgement

This work was supported in part by Samsung Electronics and the Center for the Co-Design of Cognitive Systems (CoCoSys) in JUMP 2.0, a Semiconductor Research Corporation (SRC) Program sponsored by the Defense Advanced Research Projects Agency (DARPA).