FlowiseAI / Flowise

Drag & drop UI to build your customized LLM flow
https://flowiseai.com
Apache License 2.0
32.05k stars 16.72k forks source link

[BUG] PDF Loaders: Invalid PDF structure #1855

Closed forhonourlx closed 8 months ago

forhonourlx commented 9 months ago

Describe the bug Hi, I build a basic chatflow of loading PDF, but get errors after trying different PDF files(in English / Chinese): How can I fix that? Is there any size or coding restriction? How can I adjust them? Thanks in advance. 1709387715461

example: https://arxiv.org/pdf/2402.17177.pdf

2024-03-02 21:48:55 Warning: Indexing all PDF objects
2024-03-02 21:48:55 2024-03-02 13:48:55 [ERROR]: Invalid PDF structure
2024-03-02 21:48:55 Error
2024-03-02 21:48:55     at InvalidPDFExceptionClosure (/usr/local/lib/node_modules/flowise/node_modules/pdf-parse/lib/pdf.js/v1.10.100/build/pdf.js:452:35)
2024-03-02 21:48:55     at Object.<anonymous> (/usr/local/lib/node_modules/flowise/node_modules/pdf-parse/lib/pdf.js/v1.10.100/build/pdf.js:455:2)
2024-03-02 21:48:55     at __w_pdfjs_require__ (/usr/local/lib/node_modules/flowise/node_modules/pdf-parse/lib/pdf.js/v1.10.100/build/pdf.js:45:30)
2024-03-02 21:48:55     at Object.<anonymous> (/usr/local/lib/node_modules/flowise/node_modules/pdf-parse/lib/pdf.js/v1.10.100/build/pdf.js:7939:23)
2024-03-02 21:48:55     at __w_pdfjs_require__ (/usr/local/lib/node_modules/flowise/node_modules/pdf-parse/lib/pdf.js/v1.10.100/build/pdf.js:45:30)
2024-03-02 21:48:55     at /usr/local/lib/node_modules/flowise/node_modules/pdf-parse/lib/pdf.js/v1.10.100/build/pdf.js:88:18
2024-03-02 21:48:55     at /usr/local/lib/node_modules/flowise/node_modules/pdf-parse/lib/pdf.js/v1.10.100/build/pdf.js:91:10
2024-03-02 21:48:55     at webpackUniversalModuleDefinition (/usr/local/lib/node_modules/flowise/node_modules/pdf-parse/lib/pdf.js/v1.10.100/build/pdf.js:18:20)
2024-03-02 21:48:55     at Object.<anonymous> (/usr/local/lib/node_modules/flowise/node_modules/pdf-parse/lib/pdf.js/v1.10.100/build/pdf.js:25:3)
2024-03-02 21:48:55     at Module._compile (node:internal/modules/cjs/loader:1356:14)
2024-03-02 21:48:55     at Module._extensions..js (node:internal/modules/cjs/loader:1414:10)
2024-03-02 21:48:55     at Module.load (node:internal/modules/cjs/loader:1197:32)
2024-03-02 21:48:55     at Module._load (node:internal/modules/cjs/loader:1013:12)
2024-03-02 21:48:55     at Module.require (node:internal/modules/cjs/loader:1225:19)
2024-03-02 21:48:55     at require (node:internal/modules/helpers:177:18)
2024-03-02 21:48:55     at /usr/local/lib/node_modules/flowise/node_modules/flowise-components/dist/nodes/documentloaders/Pdf/Pdf.js:105:165

Setup

HenryHengZJ commented 9 months ago

Have you tried to see if turning on Legacy build of PDF loader helps?

image