VikParuchuri / marker

Convert PDF to markdown quickly with high accuracy
https://www.datalab.to
GNU General Public License v3.0
14.65k stars 763 forks source link

RuntimeError: Invalid buffer size 17.26 GB when using LayoutLMv3 model in PDF conversion script #106

Closed archit15singh closed 2 months ago

archit15singh commented 2 months ago

GitHub Issue Title:

RuntimeError: Invalid buffer size 17.26 GB when using LayoutLMv3 model in PDF conversion script

GitHub Issue Description:

Description

I am encountering a runtime error while trying to convert a PDF to Markdown using a script that employs the LayoutLMv3 model from the transformers library. The script is designed to handle parallel processing but fails with a significant memory allocation issue.

Error Details

Running the script with a parallel_factor of 12 on a PDF results in the following error:

RuntimeError: Invalid buffer size: 17.26 GB

This error occurs during the forward pass of the LayoutLMv3 model.

Steps to Reproduce

  1. Run the conversion script with the following command:
    python convert_single.py assets/designing-web-apis-building-apis-that-developers-love-9781492026921-1492026921_compress.pdf assets/designing-web-apis-building-apis-that-developers-love-9781492026921-1492026921_compress.md --parallel_factor 12

System Information

Looking forward to suggestions or patches

archit15singh commented 2 months ago

reducing the parallel factor to 2 fixed it!