apache / seatunnel

SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
https://seatunnel.apache.org/
Apache License 2.0
7.81k stars 1.76k forks source link

[Feature][Transform] Add embedding transform #7526

Closed corgy-w closed 1 week ago

corgy-w commented 2 weeks ago

Search before asking

Description

Add embedding transform to fulfill the requirement of converting data to vectors

Reference config:

transform {
  Embedding {
    source_table_name = "fake"
    embedding_model_provider = QIANFAN
    model = bge_large_en
    api_key = xxxxxxx
    secret_key = xxxxxxx
    api_path = "https://aip.baidubce.com/rpc/2.0/ai_custom/v1/wenxinworkshop/embeddings"
    vectorization_fields {
        book_intro_vector = book_intro  # source field = taget field  , refer to copy transform
    }
    result_table_name = "embedding_output"
  }
}

Usage Scenario

No response

Related issues

No response

Are you willing to submit a PR?

Code of Conduct