CatchTheTornado / pdf-extract-api

Document (PDF) extraction and parse API using state of the art modern OCRs + Ollama supported models. Anonymize documents. Remove PII. Convert any document or picture to structured JSON or Markdown
https://demo.doctractor.com
GNU General Public License v3.0
1.33k stars 86 forks source link

Cannot re-initialize CUDA in forked subprocess #6

Closed Nasa1423 closed 2 weeks ago

Nasa1423 commented 3 weeks ago

Trying to run docker-compose -f docker-compose.gpu.yml up -d --build

Have this output:

python client/cli.py ocr --file test.pdf --prompt_file .\examples\parse-table.txt Namespace(command='ocr', file='test.pdf', ocr_cache=True, prompt=None, prompt_file='.\examples\parse-table.txt', model='llama3.1', strategy='marker', print_progress=True) File uploaded successfully. Task Id: f1c9c412-cc9e-4299-ae66-31cd47de785d Waiting for the result... {'state': 'PROGRESS', 'status': 'Extracting text from PDF', 'info': {'progress': 30, 'status': 'Extracting text from PDF', 'elapsed_time': 0.8517706394195557}} {'state': 'FAILURE', 'status': "Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method"} OCR task failed.

pkarw commented 3 weeks ago

Thanks! We need to check it out 🙏

pkarw commented 3 weeks ago

After short investigation I guess to fix it we need to add:


import multiprocessing

multiprocessing.set_start_method("spawn")

In the app/celery_config.py

pkarw commented 3 weeks ago

Probably fixed with https://github.com/CatchTheTornado/pdf-extract-api/pull/9 - please confirm @Nasa1423

PoleGeogry commented 2 weeks ago

This solution has been added to the latest branch pulled, but this problem still occurs in actual use. output: RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

app/celery_config.py: from celery import Celery

import multiprocessing multiprocessing.set_start_method("spawn", force=True)

how to solve this problem, thank you

pkarw commented 2 weeks ago

Hey @PoleGeogry! You were right. Fixed with #25 check it please + I've added an instruction on how to run it locally for other GPU support (which are not supported by Docker yet)