OoriData / OgbujiPT

Client-side toolkit for using large language models, including where self-hosted
Apache License 2.0
101 stars 8 forks source link

PGVector implementation #33

Closed kaischuygon closed 1 year ago

Aidan-Reese commented 1 year ago

Updated my .env to resemble this

DISCORD_TOKEN=MEE7
LLM_HOST=http://1.2.3.4
LLM_PORT=4321
LLM_TEMP=3.14159265359
DB_USER=TonyStark
DB_PASSWORD=StarkIndustries
DB_NAME=Friday
DB_HOST=5.6.7.8
DB_PORT=8765

[Edited this example to be a bit more copy friendly - @choccccy] Information found for DB is on Oori-PGvector on 1pass

uogbuji commented 1 year ago

Just noting, since this is a public project, that the secrets @Aidan-Reese mentions above are for an internal PG instance we manage at Oori. Others using PGVector will need to provision their own, and update environment accordingly (either through a .env file as above, or some other mechanism).

Aidan-Reese commented 1 year ago

Error that were running into:

DETAIL:  Could not open extension control file "/usr/share/postgresql/15/extension/vector.control": No such file or directory.
HINT:  The extension must first be installed on the system where PostgreSQL is running.

Yesterday we were successfully able to connect to the db with an image that we thought included PGvector but this error implies otherwise. Will need to find out what's really going on here...

 CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE IF NOT EXISTS embeddings (
            id bigserial primary key, 
            embedding vector({len(e_lorem_ipsum)}), -- embedding vector field size
            content text NOT NULL, -- text content of the chunk
            permission text, -- permission of the chunk
            tokens integer, -- number of tokens in the chunk
            title text, -- title of file
            page_numbers integer[], -- page number of the document that the chunk is found in
            tags text[] -- tags associated with the chunk
            );
choccccy commented 1 year ago

ok, getting the issues with installing the vector extension seemingly no matter where you run it from:

Image

choccccy commented 1 year ago

restarting the postgres container seems to have fixed things; Kai thinks that this is because we have just failed to do so since adjusting the image we're pulling to one that has pgvector installed.

choccccy commented 1 year ago

alright, got a super rudementary return from a table that we created. no vector functionality yet, but we're happy with it for now (https://github.com/OoriData/OgbujiPT/commit/d0077d64b7ed9f11d64f6b3c0a2cdbcc50e33332). We should investigate sorting out list weirdness (with pgvector-python?), and once we have a robust example program, we should take some of the more boilerplate functionality and make it into methods of the PGv class in embedding_helper.py