The problems occurs when we have splitPdfPage: true and splitPdfConcurrencyLevel higher than 1 => when splitting documents.
It works in a stable way with splitPdfPage: false or splitPdfConcurrencyLevel: 1
At beginning the suspicion was on older node version, but after switching to node 21 it persists. User tried on linux (ubuntu) and Macos. Unstructured have tried to repro it using files from user, but failed to get consistent results. Perhaps the bug is intermittent and it's not always appearing. Attaching the files and code sample.
One user reported a problem: https://unstructuredw-kbe4326.slack.com/archives/C044N0YV08G/p1720447185282449
At beginning the suspicion was on older node version, but after switching to node 21 it persists. User tried on linux (ubuntu) and Macos. Unstructured have tried to repro it using files from user, but failed to get consistent results. Perhaps the bug is intermittent and it's not always appearing. Attaching the files and code sample.
554504_RC_00_M2023_013_AUDITS_ENERGETIQUES_RC (1).pdf 00_M2023_013_AUDITS_ENERGETIQUES_RC (1).pdf M2023_013_AUDITS_ENERGETIQUES_CCP.pdf