FlowiseAI / Flowise

Drag & drop UI to build your customized LLM flow
https://flowiseai.com
Apache License 2.0
29.76k stars 15.33k forks source link

[BUG] "Error: End of data reached" returned when using Upset API to upsert documents #3041

Open briansoegaard opened 3 weeks ago

briansoegaard commented 3 weeks ago

Describe the bug I want to upsert different files (e.g. PDF) via the API. Some Document Loaders return an error when using them via the Upsert API. The "Text File" Document Loader works fine, but, for example, "Pdf File" and "Docx File" return the following error (content of the JSON object returned by the requests.post(...) call)

{'statusCode': 500, 'success': False, 'message': 'Error: vectorsService.upsertVector - Error: End of data reached (data length = 0, asked index = 4). Corrupted zip ?', 'stack': {}}

I can't override the settings in the Pdf File or Docx File Document Loaders node via the API.

To Reproduce

  1. Just strictly follow the Python code example provided in the documentation: Document Loaders with Upload. Use a basic .txt file. That works.
  2. Reset your vector store or just define another Namespace in the code to start over.
  3. Save the text file as a .pdf file or .docx file (for example), and in the 'form_data' replace the code accordingly, e.g.:
    
    form_data = {
    "files": ('text.pdf', open('text.pdf', 'rb'))
    }

4. When running the code, the error message above ('Error: vectorsService.upsertVector - Error: End of data reached (data length = 0, asked index = 4). Corrupted zip ?') is returned.

If the node is 'configured' in the chatflow via the UI (the text.pdf file is uploaded) the error does not occur - but then that file is  upserted every time I call the API, no matter which file I send via the API. I can't override the settings in the Pdf File Document Loader node via the API.

**Expected behavior**
I expect the "Pdf File", "Docx File", etc. Document Loaders to work like the "Text File" Document Loader where the form_data object overrides the settings in my node. 

**Screenshots**
The simple chatflow to reproduce it:
<img width="663" alt="Upsert API Bug demo" src="https://github.com/user-attachments/assets/8f56c474-1fe0-4032-a649-dff5f2618edd">

The test Python code:
<img width="1094" alt="Upsert API Bug demo - code" src="https://github.com/user-attachments/assets/f13e7794-70ca-4131-b2ef-d216af1af443">

**Setup**
-   Flowise Version: 2.0.5
-   OS: Latest macOS
-   Vector store: Pinecoe
HenryHengZJ commented 2 weeks ago

question: have you tried other pdf, docx file? Im guessting the file is corrupted because you save the text file into another format.

briansoegaard commented 2 weeks ago

It's definitely not the pdf file - it's a regular PDF. Docs files saved from Word fails too. The bug is more related to the need of configuring the node in Flowise before it's even possible to use the API.

briansoegaard commented 2 weeks ago

Any further thoughts on this bug, @HenryHengZJ ? It looks like it's impossible to upsert pdf and docx files at all via the API. To me, that's a critical bug to anyone who integrates their chatflows in solutions with a knowledgebase that can change over time.

HenryHengZJ commented 2 weeks ago

thats strange, I tried the following and it works:

1.) Have a chatflow with PDF loader: image

2.) Execute the POST call: image

RobertinaRenzi commented 5 days ago

@HenryHengZJ yes it works, but if you ask then something about the document with prediction api, it says that no doc was loaded