agronholm / sqlacodegen

Automatic model code generator for SQLAlchemy
Other
1.84k stars 240 forks source link

Error when reading vector column of pgvector #340

Open np-kyokyo opened 1 month ago

np-kyokyo commented 1 month ago

Things to check first

Sqlacodegen version

3.0.0rc5

SQLAlchemy version

2.0.31

RDBMS vendor

PostgreSQL

What happened?

Issue:

When attempting to add a new column to the recommend_model."Image" table using the following SQL command:

ALTER TABLE "recommend_model"."Image" ADD COLUMN  "embedding" vector(1408) NOT NULL;

then, we encountered an issue where the pgvector.sqlalchemy.vector.VECTOR type is not being properly recognized in the context of the get_adapted_type method. This results in the column type falling back to sqlalchemy.sql.type_api.UserDefinedType instead.

Using pgvector 0.3.0
Traceback (most recent call last):
  File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/bin/sqlacodegen", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/lib/python3.12/site-packages/sqlacodegen/cli.py", line 101, in main
    outfile.write(generator.generate())
                  ^^^^^^^^^^^^^^^^^^^^
  File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/lib/python3.12/site-packages/sqlacodegen/generators.py", line 171, in generate
    self.fix_column_types(table)
  File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/lib/python3.12/site-packages/sqlacodegen/generators.py", line 651, in fix_column_types
    column.type = self.get_adapted_type(column.type)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/lib/python3.12/site-packages/sqlacodegen/generators.py", line 703, in get_adapted_type
    if new_coltype.compile(self.bind.engine.dialect) != compiled_type and (
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/lib/python3.12/site-packages/sqlalchemy/sql/type_api.py", line 1062, in compile
    return dialect.type_compiler_instance.process(self)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/lib/python3.12/site-packages/sqlalchemy/sql/compiler.py", line 960, in process
    return type_._compiler_dispatch(self, **kw)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/lib/python3.12/site-packages/sqlalchemy/sql/visitors.py", line 141, in _compiler_dispatch
    return meth(self, **kw)  # type: ignore  # noqa: E501
           ^^^^^^^^^^^^^^^^
  File "/Users/np/gitrep/apparel-ec-recommend/batch/.venv/lib/python3.12/site-packages/sqlalchemy/sql/compiler.py", line 7262, in visit_user_defined
    return type_.get_col_spec(**kw)
           ^^^^^^^^^^^^^^^^^^
AttributeError: 'UserDefinedType' object has no attribute 'get_col_spec'
def get_adapted_type(self, coltype: Any) -> Any:
    ...
    coltype = new_coltype
    if supercls.__name__ != supercls.__name__.upper():
        break

Analysis

The VECTOR type is recognized, but since it is an uppercase type, the loop continues to check for any camel case types. As a result, UserDefinedType is found, but since it does not have the get_col_spec method, an error occurs.

Potential Fixes

  1. Avoid to use UserDefinedType when get adapted type:

    if supercls.__visit_name__ == "user_defined":
       break
    coltype = new_coltype

    UserDefinedTypeが発見された場合、すでにnew_coltypeがあった場合はそちらを優先する方針がよさそう。 It seems a good approach to prioritize an existing detected new_coltype if UserDefinedType is found.

    It seem to be below:

                if supercls.__visit_name__ == "user_defined" and new_coltype is not None:
                    continue
    
                ...
    
                adapted_coltype = new_coltype
  2. Add Special Handling for VECTOR: Include a specific check for the VECTOR type:

    coltype = new_coltype
    if supercls.__name__ == "VECTOR":
       break

    VECTORの時点で現状、キャメルケースがくることはない。→しかし、今後来るようになる可能性はある? Currently, there are no camel case types coming after VECTOR. However, it is possible that they might come in the future.

  3. Remove the Uppercase Check: Replace the existing check with a break statement, allowing VECTOR to be recognized:

    coltype = new_coltype
    # if supercls.__name__ != supercls.__name__.upper():
    #     break

    → Due to the impact on other areas, this approach is rejected.

Output on success

embedding: Mapped[Any] = mapped_column(VECTOR(1408))

References

Commit where if supercls.name != supercls.name.upper() was introduced

ralated issue

https://github.com/agronholm/sqlacodegen/issues/300

What is the difference between Text and TEXT types in SQLAlchemy?

ref. https://docs.sqlalchemy.org/en/20/core/type_basics.html#generic-camelcase-types

Removing if supercls.__name__ != supercls.__name__.upper(): break results in differences, such as parts previously outputting as Text now appearing as TEXT.

Database schema for reproducing the bug

(WIP)

np-kyokyo commented 1 month ago

~I'm dealing with this issue...~ I've been too busy to handle it.

agronholm commented 1 month ago

Did you install the project with the pgvector extra?

np-kyokyo commented 1 month ago

@agronholm yes. this is minimal requirements.lock to reproduce the issue.

# generated by rye
# use `rye lock` or `rye sync` to update this lockfile
#
# last locked with the following flags:
#   pre: false
#   features: []
#   all-features: false
#   with-sources: false
#   generate-hashes: false

inflect==7.3.1
    # via sqlacodegen
more-itertools==10.3.0
    # via inflect
numpy==2.0.0
    # via pgvector
pgvector==0.3.1
psycopg2==2.9.9
sqlacodegen==3.0.0rc5
sqlalchemy==2.0.31
    # via sqlacodegen
typeguard==4.3.0
    # via inflect
typing-extensions==4.12.2
    # via sqlalchemy
    # via typeguard
agronholm commented 1 month ago

Others have successfully tested the pgvector integration. Any idea why it's not working for you?

Tcintra commented 1 month ago

I downgraded pgvector to 0.2.5 and it worked fine, was having the same issue before!

agronholm commented 1 month ago

Pinpointing the specific commit where it starts to fail would probably help.