VikParuchuri / marker

Convert PDF to markdown quickly with high accuracy
https://www.datalab.to
GNU General Public License v3.0
17.52k stars 1.01k forks source link

Error when running marker in docker-compose #192

Open sogand145 opened 4 months ago

sogand145 commented 4 months ago

Hi, I'm using docker-compose to use marker in the container, but I get this error:

error-in-marker

and this is dockerfile: `FROM python:3.9-bullseye

RUN apt-get update && apt-get upgrade -y

RUN apt install build-essential libpoppler-cpp-dev pkg-config python3-dev openjdk-11-jdk ghostscript ocrmypdf -y

ENV JAVA_HOME /usr/lib/jvm/java-11-openjdk-amd64 ENV OCR_ENGINE=ocrmypdf ENV TORCH_DEVICE=cpu

RUN pip install marker-pdf ocrmypdf

WORKDIR /app

COPY ./by_marker-pdf /app/

CMD ["/bin/bash"]`

I should say there is no files in "marker" folder, I don't know how to change default values in settings.py I want to use ocrmypdf engine and also use cpu instead of gpu, how can I change default values?

Thanks in advance

mdoughty-tagleaf commented 4 months ago

I am experiencing the same issue of the process being killed upon bounding box detection with the following Dockerfile

FROM bitnami/pytorch

USER root

# Update container
RUN apt-get update
RUN apt-get upgrade -y

# Open GL
RUN apt-get install -y \
    libgl1-mesa-glx \
    libglib2.0-0
RUN rm -rf /var/lib/apt/lists/*

USER 1001

# Marker
RUN pip install marker-pdf

and Docker compose YAML

services:
  pdf-service:
    build:
      context: .
      dockerfile: build/PdfService.dockerfile
    tty: true
    ports:
      - "80:8484"
    volumes:
      - <pwd>/cache:/app/cache
      - <pwd>/src:/app/src
      - <pwd>/out:/app/out
      - ${HOME}/Downloads:/app/storage
    environment:
      - HF_HOME=/app/cache
      - HOME=/app/cache

and attempting a single file parse in the container.

marker_single ./storage/<file> ./out

mdoughty-tagleaf commented 4 months ago

@sogand145, I have managed to get past this being killed business. This is a RAM-intensive tool, so the solution is to significantly increase the resources available to Docker. I cracked it open to 13.5GB RAM in the Docker Desktop settings, and added the following to my compose YAML. Now it successfully detects a few bounding boxes before encountering a new and exciting error 🫠

    deploy:
      resources:
        limits:
          memory: 12G
          cpus: '6'
ujconsulting commented 3 months ago

same problem here. running in windows server 2022, WSL2 Ubuntu Environment, memory limitations should not be an issue because its limited to 64GB per machine...

make_single blows the wsl up to more than 32GB of used memory with a 7.6MB PDF file with 500 pages an then its killed. 240806_marker_error