BChip commented 1 year ago

What happened?

As a new user of ChromaDB. I immediately went into the examples and started playing around with the Chat With Your Documents example.

I loaded the state of the union data. It created chroma_storage and the sqlite perfectly.

In the main.py code, it explicitly states to use gpt-3.5-turbo:

response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=build_prompt(query, context),
    )

I then ran the main.py -- I provided my OpenAI key (which has access to both 3.5 turbo and 4).

I went and asked questions that I would think are very straightforward and easy to understand and answer. It cannot answer them at all.

Example 1 GPT 3.5-turbo

Query: What was said about the pandemic?

Thinking...

Based on the given context, there is no specific mention of anything about the pandemic. The provided information talks about the strength of the nation, a unity agenda, a goal to reduce the cancer death rate, and requesting Congress to provide benefits and comprehensive healthcare for veterans affected by toxic exposures. There is no reference to the pandemic or any discussion about it. Therefore, it is safe to say that there is no mention about the pandemic in the context provided.

Source documents:
state_of_the_union_2022.txt: line 703
state_of_the_union_2022.txt: line 709
state_of_the_union_2022.txt: line 669
state_of_the_union_2022.txt: line 659
state_of_the_union_2022.txt: line 648

As you can see, the answer is terrible and not close to what is stated in the example's readme. The references to the source documents are even terrible. None of those lines talk about the pandemic at all.

Example 2 GPT 3.5-turbo

Query: What was said about ukraine?

Thinking...

Based on the provided context, there is no specific information about what was said about Ukraine. The context only mentions a unity agenda for the nation, the strength of the State of the Union, and the accomplishments of the American people in building a strong and prosperous nation. Therefore, it is not possible to provide a direct answer to the question.

Source documents:
state_of_the_union_2022.txt: line 669
state_of_the_union_2022.txt: line 703
state_of_the_union_2022.txt: line 707
state_of_the_union_2022.txt: line 709
state_of_the_union_2022.txt: line 679

Another poor response.

Example 3 GPT 3.5-turbo

Query: How old is Officer Rivera?

Thinking...

Based on the given context, there is no information provided about Officer Rivera's age. It seems to be unrelated to the content provided. Therefore, I am not able to determine how old Officer Rivera is based on the context given.

Source documents:
state_of_the_union_2022.txt: line 638
state_of_the_union_2022.txt: line 636
state_of_the_union_2022.txt: line 642
state_of_the_union_2022.txt: line 665
state_of_the_union_2022.txt: line 681

I thought this was the easiest because it clearly states in the state_of_the_union_2022.txt on line 490: Officer Rivera was 22.

I changed the model to gpt-4 in main.py to try this out:

response = openai.ChatCompletion.create(
        model="gpt-4",
        messages=build_prompt(query, context),
    )

I got better responses and a lot better references to the source.

Example 1 GPT-4

Query: who is the longest-serving leader in the history of the united states senate?

Thinking...

The longest-serving Leader in the history of the United States Senate, based on the provided context, is Mitch McConnell.

Mitch McConnell is explicitly mentioned and congratulated for this milestone in the context provided. No other Senators are credited with such a record in the text. McConnell served as Senate Majority Leader from 2015 to 2021 and has been Senate Minority Leader since January 2021.

Source documents:
state_of_the_union_2023.txt: line 15
state_of_the_union_2023.txt: line 5
state_of_the_union_2022.txt: line 365
state_of_the_union_2023.txt: line 17
state_of_the_union_2022.txt: line 1

This lines up perfectly. On line 15 in state_of_the_union_2023: Congratulations to the longest-serving Leader in the history of the United States Senate, Mitch McConnell. Where are you, Mitch?

Example 2 GPT-4

Query: How old is Officer Rivera?

Thinking...

Based on the provided context, Officer Rivera was 22 years old. The passage states that Officer Rivera was 22. His partner, Officer Mora, was older than him at 27 years old. The two officers were serving at the New York City Police Department and chose to work the same streets they'd grown up in.

They were both of Dominican American heritage. Other than the information about their nationality, ages, and where they worked, there's no additional information provided within the context. The context also introduces Danielle Robinson from Ohio, who is linked to military families, but there doesn't seem to be a direct connection between Robinson and Officer Rivera or Mora.

Source documents:
state_of_the_union_2022.txt: line 490
state_of_the_union_2022.txt: line 488
state_of_the_union_2022.txt: line 484
state_of_the_union_2022.txt: line 492
state_of_the_union_2022.txt: line 624

As you can see, it answered Rivera's age perfectly and the reference is perfect.

Obviously, there is a problem with 3.5-turbo. Using 4 gives great results. Perhaps something changed with 3.5-turbo and its not getting the context correctly? I would assume 3.5-turbo could handle simple questions like this if the context was given correctly?

If you have any questions, please let me know.

Versions

Python 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0] on linux chromadb>=0.4.4 openai tqdm

Relevant log output

No response

EDIT

I did more digging on the context sent and I noticed the context is different sometimes. Sometimes it gives Officer Rivera's age and sometimes it does not. In this example, it does but 3.5 turbo doesn't answer the question that well; which is not ChromaDB's fault.

Query: How old is officer rivera? 

Thinking...

['Officer Rivera was 22.', 'Officer Mora was 27 years old.', 'I recently visited the New York City Police Department days after the funerals of Officer Wilbert Mora and his partner, Officer Jason Rivera.', 'Both Dominican Americans who’d grown up on the same streets they later chose to patrol as police officers.', 'Committed to military families like Danielle Robinson from Ohio.']

Based on the information provided, we know that Officer Mora was 27 years old. However, there is no direct information given about Officer Rivera's age in the context. We only know that both officers recently had funerals and they were both Dominican Americans who grew up on the same streets they patrolled.

Therefore, based solely on the given context, we cannot determine Officer Rivera's age. It is not mentioned anywhere how old Officer Rivera is in this particular scenario. So, I am not sure about Officer Rivera's age.

Source documents:
state_of_the_union_2022.txt: line 490
state_of_the_union_2022.txt: line 488
state_of_the_union_2022.txt: line 484
state_of_the_union_2022.txt: line 492
state_of_the_union_2022.txt: line 624

Here is an example of the same question, but the context is horrible

Query: How old is officer rivera? 

Thinking...

['He didn’t know how to stop fighting, and neither did she.', 'Danielle says Heath was a fighter to the very end.', 'Tonight, Danielle—we are.', 'It’s based on DARPA—the Defense Department project that led to the Internet, GPS, and so much more.', 'Now is the hour.']

Based on the given context, there is no information provided regarding Officer Rivera's age. The context talks about fighting, DARPA, and the current hour, but none of these details give any indication of Officer Rivera's age. Therefore, I am not sure how old Officer Rivera is based solely on the provided information.

Source documents:
state_of_the_union_2022.txt: line 638
state_of_the_union_2022.txt: line 636
state_of_the_union_2022.txt: line 642
state_of_the_union_2022.txt: line 665
state_of_the_union_2022.txt: line 681

Overall, it seems ChromaDB context is sometimes flakey and when its not, GPT-3.5 cannot answer the simplest question; even if its given the answer.

jeffchuber commented 1 year ago

@BChip i think switching the base example to gpt4 is probably a very good idea! thanks for doing this analysis. Do you want to open that PR? happy to land it and give you credit :)

BChip commented 1 year ago

Thank you @jeffchuber -- I have submitted a PR: #1116

atroyn commented 1 year ago

I tracked this down to an issue in the example code where we were ingesting every empty line. This leads to a degenerate HNSW graph, sometimes tanking retrieval quality.

This is now fixed in https://github.com/chroma-core/chroma/pull/1203

This was difficult to track down, and points to the need for better tooling and checks for this kind of data.

chroma-core / chroma

[Bug]: Chroma DB - Chat with your documents example not performing as expected #1115