Open ProfFan opened 7 months ago
@ProfFan I'm glad this model is useful to you. Snap2LaTex is indeed impressive, and thank you for your efforts in making such a cool tool. While I'm not quite familiar with quantization, I believe I could develop a smaller Nougat-LaTeX model based on the nougat-small. Nougat-small has only 4 decoder layers, and according to my evaluation, it can achieve ~40 tokens/s on A100 with flash-attn2 in fp16
For simple (non-multiline/array) equations (example):
even the larger model is pretty fast (using MPS backend), averaging about 4 secs after 1st run (shader compilation etc). So the current model is pretty usable already :)
For bigger matrices and multi-line equations the decoding time (as expected) grows exponentially. Interestingly converting the model to half precision does not help that much.
@ProfFan I'm glad this model is useful to you. Snap2LaTex is indeed impressive, and thank you for your efforts in making such a cool tool. While I'm not quite familiar with quantization, I believe I could develop a smaller Nougat-LaTeX model based on the nougat-small. Nougat-small has only 4 decoder layers, and according to my evaluation, it can achieve ~40 tokens/s on A100 with flash-attn2 in fp16
How to add flash-attn2 on nougat?donut-swin doesn't seem to support flash-attn2
Hi,
Thank you for this amazing model! I made a small tray utility with your model to convert LaTeX: https://github.com/ProfFan/Snap2LaTeX
However, running locally is not fast. It would be great if we can make quantized versions suitable for on-device inference :)