apache / seatunnel

SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
https://seatunnel.apache.org/
Apache License 2.0
7.92k stars 1.79k forks source link

[Feature][Transforms-V2] Simple transpose options for LLMs to rotate data for prompts #7848

Open YuriyGavrilov opened 17 hours ago

YuriyGavrilov commented 17 hours ago

Search before asking

Description

There is an option in excel which named as Transpose. It is Rotate data in 90 degree. so: 1 2 3 4 will be 1 2 3 4

and 5 6 7 8

could be as: 5 6 7 8

It is nice option to have for LLM for categorizing data and making some data about data like knowledge base.

Usage Scenario

  1. We have 10 rows for each tables. it could be 100 or 10000 or etc.
  2. We parameterize job to loop all the tables and take for 10 row for each of it.
  3. Transpose it to: Catalog, Schema, Table, Column , Row1, Row2, Row3, ...etc.
  4. Next we will ask LLM about finding personal or sensitive data inside names or values (fixed row number)
  5. We will collect all the sinked data to one table or whatever to Catalog, Schema, Table, Column, Tables Category, Attribute Category, Sensitive (true or false), Description.
  6. In addition there could be option to retrieve and sent database comments to the prompt like: Catalog, Catalog Description (comments), Schema, Schema Description ( comments ), Table, Table description ( comments ), Column name, Column description ( comments ), Row1, Row2, Row3 ( as values ).
  7. All those sink output could be send someway to Datahub and add additional attributes to tables metadata for defining security level for accessing users to the tables.
  8. There are also could be different cases to use transpose if we make some grouping by on some dimensions with fixed row size and collection output on different ways.

Related issues

Need to have additional steps or stages to prepare data for LLMs prompt.

Are you willing to submit a PR?

Code of Conduct

YuriyGavrilov commented 16 hours ago

What do you think? @hawk9821 @corgy-w

corgy-w commented 10 hours ago

cc @Hisoka-X