FlowiseAI / Flowise

Drag & drop UI to build your customized LLM flow
https://flowiseai.com
Apache License 2.0
31.33k stars 16.31k forks source link

[BUG] - Postgres Table Name with Hypen #3215

Closed dkindlund closed 1 day ago

dkindlund commented 1 month ago

Describe the bug I have a multi-page PDF that I have uploaded into the Flowise Document Store. After successfully splitting this PDF into chunks, I try to finally perform an upsert so that the Document Store can create the corresponding vector records -- one vector per chunk. However, after clicking the Upsert button, I get this Oh snap! error message:

Status: 500
Error: documentStoreServices.insertIntoVectorStore - Error: documentStoreServices._insertIntoVectorStoreWorkerThread - QueryFailedError: syntax error at or near "-"

So there is that bug AND the other bug is that this error message appears to NOT be properly word-wrapped, as the error message appears truncated in the browser. I had to go into the HTML output in order to see the full error message:

image

To Reproduce Steps to reproduce the behavior:

  1. Upload a large PDF into the Document Store
  2. Configure Embeddings, Postgres, and Postgres Record Manager settings (make sure your Table Name values contain at least one DASH in them, like foo-bar)
  3. Click Upsert button
  4. See error

Expected behavior A clear and concise description of what you expected to happen.

Screenshots image image image

Setup

Additional context To be crystal clear, when I click the Upsert button, it appears Flowise successfully creates the user_guides_rm-v1 Record Manager Table Schema, but I don't see any corresponding records inserted into that Postgres table:

flowise-data=> \d user_guides_rm-v1
                     Table "public.user_guides_rm-v1"
   Column   |       Type       | Collation | Nullable |      Default
------------+------------------+-----------+----------+-------------------
 uuid       | uuid             |           | not null | gen_random_uuid()
 key        | text             |           | not null |
 namespace  | text             |           | not null |
 updated_at | double precision |           | not null |
 group_id   | text             |           |          |
Indexes:
    "user_guides_rm-v1_pkey" PRIMARY KEY, btree (uuid)
    "group_id_index" btree (group_id)
    "key_index" btree (key)
    "namespace_index" btree (namespace)
    "updated_at_index" btree (updated_at)
    "user_guides_rm-v1_key_namespace_key" UNIQUE CONSTRAINT, btree (key, namespace)

flowise-data=> select * from "user_guides_rm-v1" ;
 uuid | key | namespace | updated_at | group_id
------+-----+-----------+------------+----------
(0 rows)

So the Upsert process never gets to the point of creating the user_guides_vec-v1 Vector table, at all.

Lastly, when I created the document chunks from the PDF, I used a standard character splitter of chunk size 1000 with 0 chunk overlap and specified a custom separator of "\n":

image

@HenryHengZJ and @vinodkiran -- any idea on how to resolve this? Thanks in advance!

dkindlund commented 1 month ago

Note: I upgraded to v2.1.1 and the bug is still present.

dkindlund commented 1 month ago

I also tried specifying NO character text splitter -- same bug persists.

dkindlund commented 1 month ago

Tried using different PDF files -- same bug persists.

dkindlund commented 1 month ago

Tried using CSV files -- same bug persists.

dkindlund commented 1 month ago

Tried using plain text -- same bug persists.

dkindlund commented 1 month ago

@HenryHengZJ are the container images posted to Docker hub stripped of all debugging? The reason I ask, is because I tried specifying DEBUG=true and LOG_LEVEL=debug to get more information on this issue, but I'm not seeing any additional errors in the application logs.

dkindlund commented 1 month ago

Also, it seems the uploaded files are stored inside the Flowise container's disk with some assumption that the files should permanently persist. Instead of using container disk for this, is there any chance it could be stored in the PostgresDB? That way, if a Flowise container restarts and there's no permanent storage, this feature will still work correctly.

Right now, if you use this feature and then restart a Flowise container (without a permanent disk), then these Document Store settings appear corrupted.

dkindlund commented 1 month ago

I was able to replicate the bug completely outside of DocumentStore just by creating a legacy chatflow that uses the same document loader nodes.

From that, I found this stacktrace:

09-18-2024 02:49:03 PM 2024-09-18 18:49:03 [ERROR]: [server]: Error: Error: QueryFailedError: syntax error at or near "-"
09-18-2024 02:49:03 PM Error: Error: QueryFailedError: syntax error at or near "-"
09-18-2024 02:49:03 PM     at buildFlow (/usr/local/lib/node_modules/flowise/dist/utils/index.js:489:19)
09-18-2024 02:49:03 PM     at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
09-18-2024 02:49:03 PM     at async upsertVector (/usr/local/lib/node_modules/flowise/dist/utils/upsertVector.js:126:32)
09-18-2024 02:49:03 PM     at async Object.upsertVectorMiddleware (/usr/local/lib/node_modules/flowise/dist/services/vectors/index.js:9:16)
09-18-2024 02:49:03 PM     at async createInternalUpsert (/usr/local/lib/node_modules/flowise/dist/controllers/vectors/index.js:28:29)
dkindlund commented 1 month ago

Looks like there was an earlier stacktrace:

09-18-2024 02:49:03 PM Error: QueryFailedError: syntax error at or near "-"
09-18-2024 02:49:03 PM     at Postgres_VectorStores.upsert (/usr/local/lib/node_modules/flowise/node_modules/flowise-components/dist/nodes/vectorstores/Postgres/Postgres.js:98:27)
09-18-2024 02:49:03 PM     at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
09-18-2024 02:49:03 PM     at async buildFlow (/usr/local/lib/node_modules/flowise/dist/utils/index.js:415:37)
09-18-2024 02:49:03 PM     at async upsertVector (/usr/local/lib/node_modules/flowise/dist/utils/upsertVector.js:126:32)
09-18-2024 02:49:03 PM     at async Object.upsertVectorMiddleware (/usr/local/lib/node_modules/flowise/dist/services/vectors/index.js:9:16)
09-18-2024 02:49:03 PM     at async createInternalUpsert (/usr/local/lib/node_modules/flowise/dist/controllers/vectors/index.js:28:29)
09-18-2024 02:49:03 PM 2024-09-18 18:49:03 [ERROR]: [server]: Error: Error: QueryFailedError: syntax error at or near "-"
09-18-2024 02:49:03 PM Error: Error: QueryFailedError: syntax error at or near "-"
09-18-2024 02:49:03 PM     at buildFlow (/usr/local/lib/node_modules/flowise/dist/utils/index.js:489:19)
09-18-2024 02:49:03 PM     at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
09-18-2024 02:49:03 PM     at async upsertVector (/usr/local/lib/node_modules/flowise/dist/utils/upsertVector.js:126:32)
09-18-2024 02:49:03 PM     at async Object.upsertVectorMiddleware (/usr/local/lib/node_modules/flowise/dist/services/vectors/index.js:9:16)
09-18-2024 02:49:03 PM     at async createInternalUpsert (/usr/local/lib/node_modules/flowise/dist/controllers/vectors/index.js:28:29)
dkindlund commented 1 month ago

Okay, progress! @HenryHengZJ, I think the root cause had to do with the fact that I specified a PostgreSQL table name that includes a dash - in the name.

Specifically, this bug triggers with the table name of: user_guides_vec-v1 This bug does NOT trigger if the table name is just: user_guides_vec (for example)

dkindlund commented 1 month ago

So there's one of two solutions to this: 1) Either add sanity check logic on all Table Name inputs to throw an error whenever a table name contains a dash. OR 2) Figure out and fix why a dash in the table name is causing problems in the first place.

HenryHengZJ commented 1 month ago

From a quick search, having hypen is not the normal naming convention, usually its small letter with underscore. Hypen is a reserved character, but strange thing is that we do escape using double quotes, not sure why it didnt work