agronholm / sqlacodegen

Automatic model code generator for SQLAlchemy
Other
1.86k stars 241 forks source link

Enhancement: Add Support for pgvector extension #300

Open KellyRousselHoomano opened 10 months ago

KellyRousselHoomano commented 10 months ago

Things to check first

Feature description

I propose enhancing sqlacodegen to include native support for the pgvector extension. Currently, the tool does not recognize the 'Vector' type of the pgvector extension. To address this, I have forked the repository and created a dedicated branch (feature-pgvector).

In this branch, I followed a similar process used for previous extensions such as "citext" or "geoalchemy2" to enable support for the pgvector extension. The modifications made to the codebase can be reviewed in the branch, and I have verified that pgvector is correctly installed when using the command:

pip install git+https://github.com/hoomano/sqlacodegen.git@feature-pgvector#egg=sqlacodegen\[pgvector\]

However, despite successful installation, running the sqlacodegen command line to export PostgreSQL database models results in the following warning:

sqlacodegen/cli.py:81: SAWarning: Did not recognize type 'vector' of column 'embedding'
  metadata.reflect(engine, schema, not args.noviews, tables)

Use case

The need for pgvector support in sqlacodegen arises from the growing adoption of Large Language Models (LLMs) and the desire to implement a retrieval tool using pgvector for efficient handling of embeddings in a PostgreSQL database. Retrieval databases, in this context, seem overkill for some use cases.

By adding native support for the pgvector extension, sqlacodegen would empower users to seamlessly integrate their PostgreSQL databases with pgvector, leveraging its capabilities such as cosine distance metrics for retrieval purposes.

This feature not only addresses our immediate requirements but also extends the utility of sqlacodegen to a broader audience engaged in similar use cases involving advanced data types like pgvector.

Your collaboration and insights on this feature request are highly appreciated. 😁

agronholm commented 10 months ago

Would you create a PR for this?

KellyRousselHoomano commented 10 months ago

Sure ! Here it is: https://github.com/agronholm/sqlacodegen/pull/301