Closed Nasa1423 closed 2 weeks ago
Thanks! We need to check it out 🙏
After short investigation I guess to fix it we need to add:
import multiprocessing
multiprocessing.set_start_method("spawn")
In the app/celery_config.py
Probably fixed with https://github.com/CatchTheTornado/pdf-extract-api/pull/9 - please confirm @Nasa1423
This solution has been added to the latest branch pulled, but this problem still occurs in actual use. output: RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
app/celery_config.py: from celery import Celery
import multiprocessing multiprocessing.set_start_method("spawn", force=True)
how to solve this problem, thank you
Hey @PoleGeogry! You were right. Fixed with #25 check it please + I've added an instruction on how to run it locally for other GPU support (which are not supported by Docker yet)
Trying to run docker-compose -f docker-compose.gpu.yml up -d --build
Have this output:
python client/cli.py ocr --file test.pdf --prompt_file .\examples\parse-table.txt Namespace(command='ocr', file='test.pdf', ocr_cache=True, prompt=None, prompt_file='.\examples\parse-table.txt', model='llama3.1', strategy='marker', print_progress=True) File uploaded successfully. Task Id: f1c9c412-cc9e-4299-ae66-31cd47de785d Waiting for the result... {'state': 'PROGRESS', 'status': 'Extracting text from PDF', 'info': {'progress': 30, 'status': 'Extracting text from PDF', 'elapsed_time': 0.8517706394195557}} {'state': 'FAILURE', 'status': "Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method"} OCR task failed.