Handling PG schemata properly

uogbuji commented 1 month ago

Up through 0.9.3 our PG embeddings helpers was not dealing with schemata at all. In a project we needed to work with Supabase, and schemata had already ben set up, and we were enabling pgvector via the Supabase online dashboard. This exposed the problems with ignoring PG Schemata.

After some study here are a few notes:

Schema public is the default for user objects in PostgreSQL. When you create tables or other objects without specifying a schema, they will typically be created in the public schema.

Schema pg_catalog is a special system schema that contains built-in functions and system catalogs. It's not typically used for user objects.

For JSON codec setup, by default, we'd use schema='pg_catalog' to ensure that the codec is applied to the built-in JSON (JSONB) types.

Recent versions of PostgreSQL, particularly PostgreSQL 15, have made changes to the default permissions on the public schema. Previously, all users had CREATE privileges on the public schema by default. In PostgreSQL 15, this is no longer the case.

Also, we should be using JSONB, not JSON, so that we have more metadata query capabilities.

uogbuji commented 1 month ago

OoriData / OgbujiPT

Handling PG schemata properly #87