Closed zero1zero closed 3 weeks ago
What if you manually override batch_multiplier and reduce it so that it stays within VRAM limits. This might slow down but can stay within the VRAM limit.
What if you manually override batch_multiplier and reduce it so that it stays within VRAM limits. This might slow down but can stay within the VRAM limit.
I set it explicitly to 1 as a parameter to convert_single_pdf above, should that have the same effect?
I also tried setting the surya batch size to as low as I can but that didn't seem to have an effect: https://github.com/VikParuchuri/marker/blob/master/marker/settings.py#L41
Alright this looks to be an effect of using Spark udf manipulation that was errantly calling load_models twice. The code above looks to be working as expected now using INFERENCE_RAM
.
I'm attempting to use marker as part of my Spark job and having some trouble getting it to not CUDA OOM. Here is my code:
convert_markdown
runs multiple times in a single thread withload_all_models()
executing only once. Should marker respect INFERENCE_RAM or are there other settings that need to be adjusted to get it to stay within my VRAM limits?For context, I'm executing in Google Collab using a T4 GPU.
Exception is below: