Adding quantized models

NormXU / nougat-latex-ocr

Codebase for fine-tuning / evaluating nougat-based image2latex generation models

https://arxiv.org/abs/2308.13418

Apache License 2.0

115 stars 13 forks source link

Adding quantized models #3

Open ProfFan opened 7 months ago

ProfFan commented 7 months ago

Hi,

Thank you for this amazing model! I made a small tray utility with your model to convert LaTeX: https://github.com/ProfFan/Snap2LaTeX

However, running locally is not fast. It would be great if we can make quantized versions suitable for on-device inference :)

NormXU commented 7 months ago

@ProfFan I'm glad this model is useful to you. Snap2LaTex is indeed impressive, and thank you for your efforts in making such a cool tool. While I'm not quite familiar with quantization, I believe I could develop a smaller Nougat-LaTeX model based on the nougat-small. Nougat-small has only 4 decoder layers, and according to my evaluation, it can achieve ~40 tokens/s on A100 with flash-attn2 in fp16

ProfFan commented 7 months ago

For simple (non-multiline/array) equations (example):

even the larger model is pretty fast (using MPS backend), averaging about 4 secs after 1st run (shader compilation etc). So the current model is pretty usable already :)

For bigger matrices and multi-line equations the decoding time (as expected) grows exponentially. Interestingly converting the model to half precision does not help that much.

$matrix$

kingqiuol commented 4 months ago

@ProfFan I'm glad this model is useful to you. Snap2LaTex is indeed impressive, and thank you for your efforts in making such a cool tool. While I'm not quite familiar with quantization, I believe I could develop a smaller Nougat-LaTeX model based on the nougat-small. Nougat-small has only 4 decoder layers, and according to my evaluation, it can achieve ~40 tokens/s on A100 with flash-attn2 in fp16

How to add flash-attn2 on nougat？donut-swin doesn't seem to support flash-attn2