Open KHMtravel opened 5 years ago
While investigating an issue with a long-running Docsplit job, which was on a PDF that contained no text, I ran into this same issue on my local dev machine. Rails app running on a vagrant instance running Ubuntu. After running for 10+ minutes, I ran out of disk space. Killed the job and restarted my host machine to get 40 GB back.
I try to extract the text of this pdf https://gofile.io/?c=6U8qE8. I have a rack application inside a docker container running on Ubuntu 18.04.
After calling
Docsplit.extract_text('spec/test.pdf', ocr: true, language: 'eng', output: 'spec/output.txt')
I see the processgs
uses the most cpu power and I lose 1GB of diskspace every 5 seconds until there is no space left.Maybe someone has an idea what is going wrong here?