Closed jazelly closed 1 day ago
@jazelly the pageContent
of the associated docment is empty?
@timothycarambat the pageContent
is not empty. It has content, but just not include script content, e.g.
VIEW ALL\nsql\nASSIGN TO AN ACCOUNT\nThe account must already exist.\nsql\n
Notice the sql
in the pageContent
, which is supposed to be a SQL command. LLM makes up the answers when we ask a question related to that, since the prompt contains no reference to the real command
Issue faced with local deployment as well. LLM responses are poor.
Issue faced with local deployment as well. LLM responses are poor.
Has nothing to do with the deployment method or RAG structure, the RAG results are bad because the scraper is returning poor information from the documents. As @jazelly mentions, it seems like some non-text blocks are not returned or parsed using the Langchain parser - which is where this lies
This might be an issue better for LangChain community.
To us, the current solution is nothing more than writing our own scraper to download these documents, and upload them to anything-llm via APIs
How are you running AnythingLLM?
Local development
What happened?
After confluence connector scraping confluence documents, the document bodies are not fully saved in JSON under
storage
.After embedding them, it will not provide useful info as expected. For example, we have a confluence doc containing some code snippets and would like to ask questions to retrieve that. The code snippets is lost after scraping, however, which caused the LLM to response basic info.
I am not sure if this is a limitation of Atlassian API but surely users would expect more than just some basic info of the confluence documents.
Are there known steps to reproduce?
No response