OoriData / OgbujiPT

Client-side toolkit for using large language models, including where self-hosted
Apache License 2.0
101 stars 8 forks source link

Normalize metadata #81

Closed choccccy closed 3 months ago

choccccy commented 4 months ago

from #79:

It's been annoying me for a while that each of the metadata fields for our vector DB are a bit different:

The ones in pgvector_data_doc use:

    tags TEXT[]                             -- tags associated with the chunk

And pgvector_message uses:

    metadata JSON                             -- additional metadata of the message

I think this second pattern is the one we should be using, so in effect it would just be an update to ogbujipt.embedding.pgvector_data_doc.

This change from lists of strings, to (JSON) dictionaries, is the whole branch.

This did come with the loss of tag based searching; it is expected that the user would implement this themselves, instead.

uogbuji commented 3 months ago

@choccccy I'm good to land this, though I still want to discuss whether we want to do any sort of migration, or just warn people about the need to rebuild their vDBs, and leave them to it.