Too much memory cost for big pdf 800 pages , cost 80GB ram.

whp98 commented 2 months ago

2024-08-25_18-59 2024-08-25_18-56

this pdf file is here https://github.com/skyformat99/books-1/blob/master/%E8%AE%A1%E7%AE%97%E6%9C%BA%E2%97%8F%E7%BC%96%E7%A8%8B%E8%AF%AD%E8%A8%80%E2%97%8FJAVA/Java%E7%BC%96%E7%A8%8B%E6%80%9D%E6%83%B3%EF%BC%88%E7%AC%AC4%E7%89%88%EF%BC%89.pdf

whp98 commented 2 months ago

sometimes it fail with cuda oom My gpu is 4060ti 16G

pdf is this https://github.com/yuanliangding/books/blob/master/%E8%AE%A1%E7%AE%97%E6%9C%BA-%E7%BC%96%E7%A8%8B%E8%AF%AD%E8%A8%80-JAVA/Java%E5%B9%B6%E5%8F%91%E7%BC%96%E7%A8%8B%E5%AE%9E%E6%88%98.pdf

torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 768.00 MiB. GPU 0 has a total capacity of 15.60 GiB of which 747.88 MiB is free. Including non-PyTorch memory, this process has 2.46 GiB memory in use. Of the allocated memory 2.14 GiB is allocated by PyTorch, and 166.20 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) Error converting PDF to Markdown: Command '['marker_single', '/home/zzz/文档/PDF/Java并发编程实战.pdf', '/home/sss/ dsadas/pdf-to-markdown/output']' returned non-zero exit status 1.

frankbaele commented 2 months ago

that's not an absurd thing to have, many pdf servicse have page/file limits for this. You can solve this by slicing your pdfs with an other pdf lib and then joining them at the end.

VikParuchuri commented 1 week ago

The CPU ram issue should be fixed now

VikParuchuri / marker

Too much memory cost for big pdf 800 pages , cost 80GB ram. #269