Closed fdchiu closed 1 year ago
Do you happen to see the vector count on the sidebar increase post-embedding 🤔 ?
I did not see any indication as you described. When I select a local document to embed from the workspace, I saw the popup to confirm it. Once I confirm and close the popup and then come back to workspace setting -> Document, the previously selected doc for embedding still has a green icon in front of it. So the doc is NOT embedded somehow?
After further testing I was able to make 2 docs to be embedded . Here is what I have observed when the embedding fails: 1) OpenAI 401 error: authentication problem. I needed to enter openai key every time when I start the frontend so need to double check if the openai key is setup correctly. I also do have openai key setup in server's env file.
2) database issues:
Inserting vectorized chunks into LanceDB collection.
[Error: LanceDBError: Append with different schema: original=Field(id=0, name=vector, type=fixed_size_list:float:1536)
Field(id=1, name=id, type=string)
Field(id=2, name=url, type=string)
Field(id=3, name=title, type=string)
Field(id=4, name=docAuthor, type=string)
Field(id=5, name=description, type=string)
Field(id=6, name=docSource, type=string)
Field(id=7, name=chunkSource, type=string)
Field(id=8, name=published, type=string)
Field(id=9, name=wordCount, type=double)
Field(id=10, name=token_count_estimate, type=double)
Field(id=11, name=text, type=string)
new=Field(id=0, name=vector, type=fixed_size_list:float:1536)
Field(id=1, name=id, type=string)
Field(id=2, name=url, type=string)
Field(id=3, name=title, type=string)
Field(id=4, name=description, type=string)
Field(id=5, name=published, type=string)
Field(id=6, name=wordCount, type=double)
Field(id=7, name=token_count_estimate, type=double)
Field(id=8, name=text, type=string)
]
addDocumentToNamespace LanceDBError: Append with different schema: original=Field(id=0, name=vector, type=fixed_size_list:float:1536)
Field(id=1, name=id, type=string)
Field(id=2, name=url, type=string)
Field(id=3, name=title, type=string)
Field(id=4, name=docAuthor, type=string)
Field(id=5, name=description, type=string)
Field(id=6, name=docSource, type=string)
Field(id=7, name=chunkSource, type=string)
Field(id=8, name=published, type=string)
Field(id=9, name=wordCount, type=double)
Field(id=10, name=token_count_estimate, type=double)
Field(id=11, name=text, type=string)
new=Field(id=0, name=vector, type=fixed_size_list:float:1536)
Field(id=1, name=id, type=string)
Field(id=2, name=url, type=string)
Field(id=3, name=title, type=string)
Field(id=4, name=description, type=string)
Field(id=5, name=published, type=string)
Field(id=6, name=wordCount, type=double)
Field(id=7, name=token_count_estimate, type=double)
Field(id=8, name=text, type=string)
Failed to vectorize website-github.com/article-_AUTOMATIC1111_stable-diffusion-webui_wiki_API.json
I needed to check server log to find out the error.
Are you running this in docker or in development mode? If you are running this in Docker the correct env
file placement is docker/.env
. Otherwise its server/.env
and if in development its server/.env.development
.
Confusing, I know. Will be resolved by #281
I was running in development mode.
I assume you were talking about the LanceDB issue?
The issue with saving the api key (openAI, pinecone etc.) in the frontend once one has entered them and also in server .env seems bazar .
Nexttime when I restart the frontend, I have to reenter the keys and config. If there is anything I can help you debug, let me know. I was running on localhost.
Yeah, so if running in development mode and you make an edit to the code the backend will hot-reload, which also means that your process.env
is unset! So that is why the keys keep clearing for you.
You need to set a server/.env.development
and just put the proper keys in - that way on server reloads/restarts it you wont have to re-input your credentials.
# Example /server/.env.development
SERVER_PORT=3001
CACHE_VECTORS="true"
JWT_SECRET="some-random-JWT-string" # Please generate random string at least 12 chars long.
###########################################
######## LLM API SElECTION ################
###########################################
LLM_PROVIDER='openai'
OPEN_AI_KEY=sk-ABC123OPENAI
OPEN_MODEL_PREF='gpt-3.5-turbo'
###########################################
######## Vector Database Selection ########
###########################################
# Enable all below if you are using vector database: Pinecone.
VECTOR_DB="pinecone"
PINECONE_ENVIRONMENT=us-xxxx-gcp
PINECONE_API_KEY='123-456'
PINECONE_INDEX=your-index-name
As for the lance issue, i think it might be trying to assume a LanceDB environment exists but because the vector db selection keeps changing that is what is going wrong. Ill hold off on commenting on that issue for now. The primary issue for you is the hot-reloading of the backend wiping out your process.envs
@timothycarambat I do have the keys and configuration setup in .env with server. But when the frontend is started, the settings for the keys are empty and I have to manually enter them again.
But I'll retest.
Steps: 1) Add a pdf file to hotdir with Python watch.py running 2) Check to make sure the file is processed through the terminal message running watch.py 3) Go back to anything-llm tab in browser a workspace is setup running already 4) Go to workspace setup and click/enable embedding of the newly added document 5) Ask GPT a question with regard to info that is only available in the document 6) GPT would not be able to answer the question
Please let me know anything missing in my steps?
Thanks!