airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
14.92k stars 3.83k forks source link

[destination-weaviate] PGVector embeddings interpreted as String? #39597

Open vade opened 1 month ago

vade commented 1 month ago

Connector Name

destination-weaviate

Connector Version

0.2.19

What step the error happened?

Configuring a new connector

Relevant information

During mapping of source to destination, it seems as though fields with existing numerical arrays, such as PGVector fields are not being interpreted as embeddings to pass through to weaviate as existing embeddings?

image

Relevant log output

No response

Contribute

vade commented 1 month ago

After playing around a bit more, was able to configure a single field, but I error with:

In stream ozu_api_segment does contain embedding vector field visual_cinema_clip_embedding, but it is not a list of numbers. Please check your embedding configuration, the embedding vector field has to be a list of numbers of length 640 on every record.
vade commented 1 month ago

Looks like this might be an error / oversight / limitations of the Postgres source, not Weaviate connector:

And some ad hoc support to determine the PGVector precisions to map correctly so its seen as numeric array type with the right precision.

marcosmarxm commented 1 month ago

Thanks for reporting the issue @vade I added this to the AI team backlog to take a look.