Closed BChip closed 1 year ago
@BChip i think switching the base example to gpt4 is probably a very good idea! thanks for doing this analysis. Do you want to open that PR? happy to land it and give you credit :)
Thank you @jeffchuber -- I have submitted a PR: #1116
I tracked this down to an issue in the example code where we were ingesting every empty line. This leads to a degenerate HNSW graph, sometimes tanking retrieval quality.
This is now fixed in https://github.com/chroma-core/chroma/pull/1203
This was difficult to track down, and points to the need for better tooling and checks for this kind of data.
What happened?
As a new user of ChromaDB. I immediately went into the examples and started playing around with the Chat With Your Documents example.
I loaded the state of the union data. It created chroma_storage and the sqlite perfectly.
In the main.py code, it explicitly states to use gpt-3.5-turbo:
I then ran the
main.py
-- I provided my OpenAI key (which has access to both 3.5 turbo and 4).I went and asked questions that I would think are very straightforward and easy to understand and answer. It cannot answer them at all.
Example 1 GPT 3.5-turbo
As you can see, the answer is terrible and not close to what is stated in the example's readme. The references to the source documents are even terrible. None of those lines talk about the pandemic at all.
Example 2 GPT 3.5-turbo
Another poor response.
Example 3 GPT 3.5-turbo
I thought this was the easiest because it clearly states in the state_of_the_union_2022.txt on line 490:
Officer Rivera was 22.
I changed the model to gpt-4 in main.py to try this out:
I got better responses and a lot better references to the source.
Example 1 GPT-4
This lines up perfectly. On line 15 in state_of_the_union_2023:
Congratulations to the longest-serving Leader in the history of the United States Senate, Mitch McConnell. Where are you, Mitch?
Example 2 GPT-4
As you can see, it answered Rivera's age perfectly and the reference is perfect.
Obviously, there is a problem with 3.5-turbo. Using 4 gives great results. Perhaps something changed with 3.5-turbo and its not getting the context correctly? I would assume 3.5-turbo could handle simple questions like this if the context was given correctly?
If you have any questions, please let me know.
Versions
Python 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0] on linux chromadb>=0.4.4 openai tqdm
Relevant log output
No response
EDIT
I did more digging on the context sent and I noticed the context is different sometimes. Sometimes it gives Officer Rivera's age and sometimes it does not. In this example, it does but 3.5 turbo doesn't answer the question that well; which is not ChromaDB's fault.
Here is an example of the same question, but the context is horrible
Overall, it seems ChromaDB context is sometimes flakey and when its not, GPT-3.5 cannot answer the simplest question; even if its given the answer.