Open denischernenko opened 1 month ago
Heya @denischernenko - I was able to get MPS acceleration going with surya-ocr by following the guide from apple here: https://developer.apple.com/metal/pytorch/
Note that there is no support for this within containers or vms, but so long as you're running the nightly pytorch natively on an modern macbook, its just about as fast as when using a GPU. I've only tested this on an M3 macbook air.
Hope this helps!
@erulabs You are so right on time with that! I just finished building a pipeline Surya + LM Studio with Mistral model for Surya's OCR errors correction. 356 pages of PDF took 7 hours to go through the whole process all together...
I really hope, that I will speed it up, following the instructions you provided.
Thank you very much for taking time and posting this!
@erulabs Did not really do much to overall performance. In fact, MPS is backing out of memory even more often now :) Thank you for trying, though...
Vikas, first of all — thank you very much indeed for creating such an amazing software and making it accessible to ordinary people.
I have MacBook M1 Pro 16Gb (10-Core CPU/16-Core GPU) and, obviously, I added these lines to my Python code (I did not use GUI):
`os.environ["PYTORCH_MPS_HIGH_WATERMARK_RATIO"] = "0.0" os.environ["PYTORCH_DEVICE"] = "mps"
settings.DETECTOR_BATCH_SIZE = 1024 settings.RECOGNITION_BATCH_SIZE = 128`
However, the whole OCR process (especially, recognition) seems to be pretty slow. I tried shutting down all the applications, running the code through Terminal.app instead of PyCharm IDE, spent many hours on Perplexity, Claude, ChatGPT and Gemini to seek for some code improvements, but all failed to improve the performance.
Question: I am right in assumption, that the main bottleneck here is PyTorch library with its (very) limited support of Metal Performance Shaders framework? And the only hope is waiting for PyTorch community to release newer versions with better MPS handling? Not to blame those development heroes, who created such a massive and useful library, of course. Or I just need to buy some cloud server for a few days to OCR my whole library? :)
Thank you very much for reading this and all the best of luck to you and all the contributors of this project!
P. S. If writing questions like that in 'Issues' section is a violation of community rules here — I will remove it.