FlowiseAI / Flowise

Drag & drop UI to build your customized LLM flow
https://flowiseai.com
Apache License 2.0
29.79k stars 15.34k forks source link

[BUG] #2074

Closed Nexacybersecurity closed 5 months ago

Nexacybersecurity commented 5 months ago

Hi, i am new to the github community as well as to flowise. I am basically looking for a solution through which i can talk to my documents using chatgpt or google gemini pro. I setup flowise using the videos listen on the website on multiple servers but facing following issue.

Pinecode upsert does not do what it is suppose to do and take the data to its database, leaving the chat box with generic answers about whatever document / text / text file / pdf you upload to the flowchart. By making different changes to the system, i was able to run the entire flow and the text from my file was broken and uploaded as vectors to pinecone database. But then when i changed the file, i keep getting old data through the chat.

i restarted the service / server. installed server on docker and nodejs, tried everything but unable to successfully chat with any of my document so far. can anyone please help

toi500 commented 5 months ago

Can you share your chatflow so we can check it out? It is important to know what sort of document you are upserting, how it is spited, etc.

Also, a capture from Pinecone like this one to see how your data is being saved:

Screenshot 2024-03-31 143826

Nexacybersecurity commented 5 months ago

Hi, I used the template "Conversational Retrieval QA Chain" from the market place and filled in the required information, below is the screenshot.

I tried Plain Text, Text File and Pdf File from Document Loaders but nothing seems to work for me. My test setup is available on the internet, i am uncertain if i am allowed to provide the link here for my test environment or not, if i am, i will gladly provide the link.

Futher, i did see vectors from pincone only one time when the data was being repeated again n again while i was changing file. however, when i installed flowise on my cloud server, the pinecone vectors are empty and chatbox replies me with generic answers like i do not have access to any document etc. While setting up pinecone, i used 1536 number

flowchart-flowise

Nexacybersecurity commented 5 months ago

I added a text box and placed some text, saved the flow and in chat i wrote what is this document about and got this response. Also attached is the pinecone page which is showing 0 vector

what is this doc about?

AI "I apologize for any confusion. However, as an AI language model, I don't have access to the specific document in question. If you could provide some context or specific details about the document, I'll do my best to help you understand its content or provide relevant information based on the details you provide."

flowchart-flowise

toi500 commented 5 months ago
  1. I would convert your index to serverless (it is free, better and you only need press the blue bottom)

  2. For the text embedding 3 small you need set up the dimension to 512

Screenshot 2024-03-31 172538
  1. The generic response is because there is not data at all on Pinecone.
toi500 commented 5 months ago

Also, once you solve the Pinecone misconfiguration you could use MR QA Chain where you can add instructions to let the model know when to use a set of documents in particular or other ones:

Screenshot 2024-03-31 174626

Note: Use 1 index per topic (with Pinacone serverless it does not matter how many indexes do you have since you will pay, almost nothing, for use only).

Here is the template: Template Multi Retrieval Chain - Chatflow.json

Nexacybersecurity commented 5 months ago

Did the first part as per instructions, but again same issue. Now going serverless now as it is asking for credit card.

flowchart-flowise

Nexacybersecurity commented 5 months ago

Same situation even after going serverless and setting dimensions to 512. Now trying the third option with MR QA Chain.

flowchart-flowise

Nexacybersecurity commented 5 months ago

Hi, i did uploaded your template too but since my serverless pinecone vector still showing 0, there is nothing to show even using your template. i saw some text on my linux console from where i started the server. below is the text, if it can help find the root cause.

root@Flowise-Temp:~# npx flowise start 2024-03-31 19:23:51 [INFO]: Starting Flowise... 2024-03-31 19:23:51 [INFO]: ⚡️ [server]: Flowise Server is listening at 3000 2024-03-31 19:23:51 [INFO]: 📦 [server]: Data Source is being initialized! 2024-03-31 19:23:54 [INFO]: 📦 [server]: Data Source has been initialized! 2024-03-31 19:24:09 [INFO]: 🖊 PUT /api/v1/chatflows/fc5e6d56-6495-4882-8b42-2658c96214fb 2024-03-31 19:24:21 [INFO]: ⬆️ POST /api/v1/internal-prediction/fc5e6d56-6495-4882-8b42-2658c96214fb 2024-03-31 19:24:22 [INFO]: [server]: Chatflow fc5e6d56-6495-4882-8b42-2658c96214fb added into ChatflowPool 2024-03-31 19:26:02 [INFO]: 🖊 PUT /api/v1/chatflows/fc5e6d56-6495-4882-8b42-2658c96214fb 2024-03-31 19:26:02 [INFO]: [server]: Chatflow fc5e6d56-6495-4882-8b42-2658c96214fb updated inSync=false in ChatflowPool 2024-03-31 19:26:09 [INFO]: ⬆️ POST /api/v1/internal-prediction/fc5e6d56-6495-4882-8b42-2658c96214fb 2024-03-31 19:26:09 [INFO]: [server]: Chatflow fc5e6d56-6495-4882-8b42-2658c96214fb added into ChatflowPool 2024-03-31 19:26:27 [INFO]: 🖊 PUT /api/v1/chatflows/a0e39982-7093-4341-b505-b756310b345e 2024-03-31 19:26:33 [INFO]: 🖊 PUT /api/v1/chatflows/a0e39982-7093-4341-b505-b756310b345e 2024-03-31 19:26:39 [INFO]: ⬆️ POST /api/v1/internal-prediction/a0e39982-7093-4341-b505-b756310b345e 2024-03-31 19:26:40 [INFO]: [server]: Chatflow a0e39982-7093-4341-b505-b756310b345e added into ChatflowPool 2024-03-31 19:26:45 [ERROR]: [server]: Error: Cannot read properties of undefined (reading 'text') TypeError: Cannot read properties of undefined (reading 'text') at App.buildChatflow (/usr/lib/node_modules/flowise/dist/index.js:1961:24) at async /usr/lib/node_modules/flowise/dist/index.js:1265:13 2024-03-31 19:29:14 [INFO]: 🖊 PUT /api/v1/chatflows/a0e39982-7093-4341-b505-b756310b345e 2024-03-31 19:29:14 [INFO]: [server]: Chatflow a0e39982-7093-4341-b505-b756310b345e updated inSync=false in ChatflowPool 2024-03-31 19:29:20 [INFO]: ⬆️ POST /api/v1/internal-prediction/a0e39982-7093-4341-b505-b756310b345e Warning: TT: undefined function: 32 Warning: TT: undefined function: 32 2024-03-31 19:29:22 [INFO]: [server]: Chatflow a0e39982-7093-4341-b505-b756310b345e added into ChatflowPool 2024-03-31 19:29:26 [ERROR]: [server]: Error: Cannot read properties of undefined (reading 'text') TypeError: Cannot read properties of undefined (reading 'text') at App.buildChatflow (/usr/lib/node_modules/flowise/dist/index.js:1961:24) at async /usr/lib/node_modules/flowise/dist/index.js:1265:13 2024-03-31 19:29:57 [INFO]: 🖊 PUT /api/v1/chatflows/a0e39982-7093-4341-b505-b756310b345e 2024-03-31 19:29:57 [INFO]: [server]: Chatflow a0e39982-7093-4341-b505-b756310b345e updated inSync=false in ChatflowPool 2024-03-31 19:30:32 [INFO]: ⬆️ POST /api/v1/internal-prediction/fc5e6d56-6495-4882-8b42-2658c96214fb

Nexacybersecurity commented 5 months ago

Finally i have realized that my data is not going to pinecone, no matter what i do. i did config all available embeddings and models, text-embedding-3-small | 512, text-embedding-3-large | 3072 and text-embedding-ada-002 | 1536. But the issue remains the same. i am unable to upload my data to pinecone or upsert it.

toi500 commented 5 months ago

Just checked it and it works for me. Try Embedding ADA 002 with 1536

Screenshot 2024-03-31 220504 Screenshot 2024-03-31 220608
Nexacybersecurity commented 5 months ago

Tried it again, not working for me bro. i am not sure what i am doing wrong. i did setup twice, once with docker and second with nodjs. my server is hosted on hetzner, no issue on the internet too. using linux ubuntu 22 lts i think. Would you be kind enough to check my server for any error or step i missed? if i am allowed to share my email or server address here?

Nexacybersecurity commented 5 months ago

What version are you using? i think i will now install previous version to check everything again

Nexacybersecurity commented 5 months ago

another failure, installed version flowise@1.6.1, still the data is not being upserted on uploaded to pinecone.

toi500 commented 5 months ago

My instance is up to date, from yesterday.

Just a quick question: What system message do you get from Flowise when you upsert the data?

Screenshot 2024-03-31 221443

Screenshot 2024-03-31 231422

Nexacybersecurity commented 5 months ago

Basically when i setup the entire flow i ask the chat box any question like 'what is this doc about?' and in response the chatbox returns with generic text like as a AI i do not understand etc. provide more context etc.

Only one time i was able to upsert my data for one text or pdf file, i don't remember but after i changed the file it was showing me responses from previous file so my troubleshooting journey started from there. Up until that point i saw some vectors on my pinecone account too but after that, total nada.

On Mon, Apr 1, 2024, 2:19 AM toi500 @.***> wrote:

My instance is up to date, from yesterday.

Just a quick question: What message do you get from Flowise when you upsert the data? Screenshot.2024-03-31.221443.png (view on web) https://github.com/FlowiseAI/Flowise/assets/138339291/a292b737-c8f0-4130-baa4-fb220aa7c897

— Reply to this email directly, view it on GitHub https://github.com/FlowiseAI/Flowise/issues/2074#issuecomment-2028907579, or unsubscribe https://github.com/notifications/unsubscribe-auth/BFQU6FORSTCVVTCFQG7VQ53Y3B4XVAVCNFSM6AAAAABFQMCAP2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRYHEYDONJXHE . You are receiving this because you authored the thread.Message ID: @.***>

Nexacybersecurity commented 5 months ago

ohh man, i was missing this point in the entire process. sorry for my ignorance and thank alot for your time. it was the green button which i needed to press to upsert the data. I watched 3 videos on youtube and all of them said, once everything is setup just talk to the chatbox and it will do everything itself. i just fired up the laptop and upserted the data, now i can see 37 vectors. Thanks alot brother, really appreciate your help.

Just a quick question, i am basically looking into compliance and wanted to chat with regulatory frameworks in order to extract requirements and obligatios from them, after this workflow is setup properly, chatgpt4 and google gemini advance will have the full coverage of the pdf right? because up until this point, i was doing everything manually by providing snippets to both AI in bits and pieces. Also is there any alternative approach if this is not what i was looking for?

toi500 commented 5 months ago

No worries, my friend. Happy to help.

Thats the point, once the data is on the vector store they, your models, will know what the hell you are talking about :)

Also, remember to disconnect (or remove) the loader and splitter nodes from the chain since you will not need them until you want to upsert data again.

Screenshot 2024-04-01 000825

For your purpose, check the template I sent you, you could divide your data per regulatory framework or topic and let the model know when to use it. A model is as good as the quality of the data it has.

toi500 commented 5 months ago

You will need to test it out to see what is working better for you (4 indexes or just 1 with all the data). You can also use 1 index with custom metadata (but forget about this for now).

If you want the 4 indexes approach, i made this template for you: Template (Multi Retrieval Chain) Chatflow_v2.json

Screenshot 2024-04-01 011242

On Pinacone:

  1. Create a new project
  2. Set up 4 indexes: 001, 002, 003 and 004 for "ADA 002" and 1536.
  3. Upsert the data to them as you did today using a dummy chatflow like this one:
Screenshot 2024-04-01 005150
  1. Then set everything up in the template.

About the chunk size (where the data is splitted in "documents", there is not right answer here since that depends totally of your dataset.

If, per example, your data has a lot of charts you will want to use a big chunk size so there are less chances that 1 chart get divided into 2 "documents", losing a bit of meaning. I will leave at 1000 if you do not know what i am taking about :)

This is a document, so you can understand me:

Screenshot 2024-04-01 011809

Nexacybersecurity commented 5 months ago

No no, now i understand what you mean bro :))

Thanks alot, ill try to set it up, been looking for something like this from past few weeks. Thanks once again for your help and support, God bless you.