Remove `sort`, `sort_reversed`, and `seed` from the protocol

dglazkov commented 1 year ago

Instead of having a special kind of sort, consider sending a random vector:

import polymath

import numpy as np

library = polymath.Library(filename="libraries/wdl-library.json")
query_vector = np.random.rand(1536)
query = polymath.Library.base64_from_vector(query_vector)
result = library.query(
    version=1,
    query_embedding=query,
    count=1000,
    query_embedding_model="openai.com:text-embedding-ada-002",
    sort="similarity")
print('random test\n\n')
for bit in result.bits:
    print(bit.text)

dglazkov commented 1 year ago

Here's my intuition: I don't think we're going to use these. Let's just remove them?

@jkomoros WDYT?

dglazkov commented 1 year ago

Add them as we find use case maybe

jkomoros commented 1 year ago

Yeah, I agree we can remove them. sort_reverse doesn't actually have a use case, it's a thing that might hypothetically be useful in the future if there's a sort where you want to have a reverse order (but that's hypothetical).

In the future we might want to support other kinds of sorts, like similarity-per-token but we can cross that bridge when we come to it.

There is a similar distinction between pick a random space in the embedding space and return all of the bits closest to it and give me a totally random collection of bits. The latter is useful theoretically for things like "What are some concepts in this library." But it's also useful for the use case of scraping content from the library (see #94) by continually just fetching random bits. And anyway we can just make it "if you don't pass a query_embedding, we return a random set of results up to the content length limit"

The seed is only useful if you do want a random sort (which is not obvious we do; the random embedding use case is probably better way of doing that), and you want a way to express "don't cache the request I just did, give me a different random sort" by passing a different seed.

I imagine in the future that there will be use cases for different sorts, but for now it's probably best to just rip out the machinery given that we don't need it right now, in the interest of keeping semantics simpler.

jkomoros commented 1 year ago

@dglazkov Wanderer doesn't currently use sort=random anymore does it? If not I might just start ripping this out

dglazkov commented 1 year ago

Yep, it doesn't use it anymore. I am using the random vector thingy now.

jkomoros commented 1 year ago

[x] Document the protocol somewhere, including all arguments and what they do, now that Library.query() uses **kwds it's hard to tell
[x] Remove sort from protocol
[ ] Shouldn't Library.query() set sort=similarity, not _produce_query_result?

dglazkov / polymath

Remove `sort`, `sort_reversed`, and `seed` from the protocol #81