Ensure all files are taken into account

jerome-white commented 3 months ago

The vector store list protocol seems to "page" responses; by default returning only 20 items (see here). This PR updates the sync code to take this paging into account, going through all files in a store rather than the default number.

SmittieC commented 3 months ago

@jerome-white You can use pre-commits to fix the minor styling issue. Also just noting the failing test :pray:

jerome-white commented 3 months ago

Also just noting the failing test

The test that's failed has me stumped: I think there's a mismatch in the type that the test expects, and the type that OpenAI projects. The line in question:

https://github.com/dimagi/open-chat-studio/blob/524cd6fde7edcc269dd6c6cfcbd581e8b5a02db8/apps/assistants/sync.py#L178

The test feels vector_store_files should be a list; however, based on the OpenAI documentation, it should be a "SyncCursorPage" (see here). It's based on what I'm seeing from OpenAI that I drafted this PR. I did some testing on my own OpenAI account with various vector stores and SyncCursorPage's were always returned, even when the vector store was empty.

It looks like the test case may do some mocking (?). Irrespective, is handling a list -- along with SyncCursorPage's -- the intended behaviour?

jerome-white commented 3 months ago

Altering the test case with the OpenAI return type allows the test to pass. See this PR for details.

If altering the test in this way is acceptable, I'll merge those changes into this PR

SmittieC commented 3 months ago

Altering the test case with the OpenAI return type allows the test to pass. See this PR for details.

If altering the test in this way is acceptable, I'll merge those changes into this PR

Apologies for only getting back now. Yeah that PR's changes seems fine :+1:

dimagi / open-chat-studio

Ensure all files are taken into account #478