More savvy ways of selecting chunks for the context

jkomoros commented 1 year ago

Now we will be selecting chunks from across a number of end points.

The simplest way to select chunks for the context is split the available context tokens by the number of sources, and then fill up that much context from each source. But some sources will have more similar chunks than others.

Another approach is to fetch enough chunks from each endpoint that any of the endpoints' chunks could fill up the whole context. Then merge them all together and take the most similar, walking down the list until all of the context is filled.

But this has two additional problems: 1) it might lead to one very chatty endpoint dominating all of the context, and 2) one relevant but verbose piece of context might take all the space.

For that reason, the ranking of chunks should probably take the overall amount of similarity and divide it by number of tokens, a bang-for-buck of tokens. Second, we should rank the similarity with a twiddle based on the log of number of tokens already selected for that source (or some other fall off function). That means that we'll try to select one chunk from each source unless they really aren't good.

Perhaps there should be multiple strategies that are all configurable.

Perhaps it should be possible to allow the runner of the final client to configure boosts for endpoints based on how much they want them to contribute to the final result.

Related to #8.

jkomoros commented 1 year ago

[x] Client.main/sample uses the machinery in #50 and #52
[x] Support merging libraries and maintaining a sort so multiple endpoints results can be merged into one library and then the correct amount of the merged ones taken into the final prompt
[x] sample.client should combine the libraries into one library
[x] Fetch a larger window of context from each endpoint and then slice it down to the smaller amount
[ ] Allow configuring how large of the overall size should be fetched from endpoints and then sliced down (for example, you might want to ensure diversity vs best result). Maybe literally have it be diversity, a float between 0.0 and 1.0, that sets the max_size for each host to the diff between the total size for the whole context window, and how much it would be if they each split it.
[ ] Suppress the prompt length warning of Token indices sequence length is longer than the specified maximum sequence length for this model (1526 > 1024). Running this sequence through the model will result in indexing errors when using get_max_answer_length
[ ] Compute a more accurate fudge factor (including the text of the prompt itself, plus tokens for chunk separators) in context_count. See also the TODO in library.slice around tokens for separators
[ ] Allow a similarity_per_token sort
[x] Get rid of get_context_per_library
[x] Add a library.truncate() that returns a new library with only the following items, and takes a max count. Basically _get_context.
[x] Add a library.content() that returns a concatenation of all library content in order
[x] File a new issue about allowing a shorter prompt so you can get a longer completion (#49)

jkomoros commented 1 year ago

Hmmm investigating get_chunk_infos_for_library and how to handle sorted things, I realize that I think I based the design on a thing ChatGPT hallucinated, that dict keys in python are in order.

That implies that the first order of business is to make it so libraries actually do maintain their intended sorted order that survives across serialization and deserialization.

dglazkov / polymath

More savvy ways of selecting chunks for the context #14