Add inference notbook without quant

GirinMan / HYU-Graduation-Project-Quantization

한양대학교 컴퓨터소프트웨어학부 졸업 프로젝트 진행용 레포지토리입니다.

Apache License 2.0

0 stars 0 forks source link

Closed GirinMan closed 1 year ago

GirinMan commented 1 year ago

Quantization을 적용하지 않은 버전의 Layer with adapter 정의
Model을 fp16으로 load하는 과정에서 기존 adapter layer의 data type과 달라져 adapter layer의 dtype을 float_16으로 정의함
Quantized된 모델에 비해 inference speed 약 4x