Mintplex-Labs / anything-llm

The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, and more.
https://anythingllm.com
MIT License
22.62k stars 2.28k forks source link

[BUG]: Some confluence doc name cannot be saved - throwing ENOENT: no such file or directory #1737

Closed jazelly closed 2 months ago

jazelly commented 2 months ago

How are you running AnythingLLM?

Local development

What happened?

We have a confluence doc named 2021-09-15 "Tested" Retrospective. When using confluence connector, I got an error like this.

Error: ENOENT: no such file or directory, open 'E:\github\my\anything-llm\server\storage\documents\XXXX\2021-09-15-"Tested"-Retrospective-bf2937a8-15bc-4da9-9b1b-efce42d892e4.json'
    at Object.writeFileSync (node:fs:2348:20)
    at writeToServerDocuments (E:\github\my\anything-llm\collector\utils\files\index.js:59:6)
    at E:\github\my\anything-llm\collector\utils\extensions\Confluence\index.js:92:5
    at Array.forEach (<anonymous>)
    at loadConfluence (E:\github\my\anything-llm\collector\utils\extensions\Confluence\index.js:72:8)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async E:\github\my\anything-llm\collector\extensions\index.js:115:43 {
  errno: -4058,
  code: 'ENOENT',
  syscall: 'open',
}

Are there known steps to reproduce?

  1. Have a confluence document with title that has quotes
  2. Use connector to scrape it.
jazelly commented 2 months ago

After digging around, I found it's not related to the quotation mark, as it's quite common in other documents and they can be correctly saved

timothycarambat commented 2 months ago

E:\github\my\anything-llm\server\storage\documents\XXXX\2021-09-15-"Tested"-Retrospective-bf2937a8-15bc-4da9-9b1b-efce42d892e4.json is less than the MAX_PATH for Windows, which would be the other non-obvious issue that could block a file write aside from disk space and permissioning

timothycarambat commented 2 months ago

Out of curiosity, double quotes are prohibited chars in file name. Does removing them from the Confluence document title allow the file to save? I know you say others are allowed, but I'm wondering if they are encoded differently or something

jazelly commented 2 months ago

double quotes are prohibited chars in file name.

I think the quotation is indeed the culprit.

Does removing them from the Confluence document title allow the file to save?

yes, if simply removing the quotation marks, it will work, but I discovered another problem. see below

I know you say others are allowed

The reason why I said other files were successfully saved was I had a file called My Dashboard : manage "Admin" access and it got saved but the saved filename is incomplete like My-Dashboard-. The content is also empty.

The reason is that :. It's treated as part of the drive notation in Windows, but it works fine in MacOS IIRC.

Generally, I think the filename should be sanitized before passing to writeFileSync. I can raise a PR for this