Closed YuriyGavrilov closed 1 week ago
How about support join with dimension table (fake source is one type of dimension table)? I think we can extend this requirement to any source. eg: join with jdbc
transform {
JoinWithSource {
join_on = "source.id = type_bin.item_id"
source = [
Jdbc {
url = "jdbc:mysql://localhost/test?serverTimezone=GMT%2b8"
driver = "com.mysql.cj.jdbc.Driver"
connection_check_timeout_sec = 100
user = "root"
password = "123456"
query = "select * from type_bin"
}
]
}
}
or join with fake source
transform {
JoinWithSource {
join_on = "source.id = fake.c_int"
source = [
FakeSource {
row.num = 5
schema {
fields {
c_string = string
c_tinyint = tinyint
c_smallint = smallint
c_int = int
c_bigint = bigint
c_float = float
c_double = double
}
}
}
]
}
}
Then we can use SQL
transform to filter data you want.
Or join with sql transform
env {
parallelism = 10
job.mode = "BATCH"
}
source {
Jdbc {
url = "jdbc:mysql://localhost/test?serverTimezone=GMT%2b8"
driver = "com.mysql.cj.jdbc.Driver"
connection_check_timeout_sec = 100
user = "root"
password = "123456"
table_path = "testdb.table1"
query = "select * from testdb.table1"
split.size = 10000
}
FakeSource {
row.num = 5
schema {
fields {
c_string = string
c_tinyint = tinyint
c_smallint = smallint
c_int = int
c_bigint = bigint
c_float = float
c_double = double
}
}
}
}
transform {
sql {
query = "select * from table1 join table2 on table1.id = table2.id"
}
}
sink {
Console {}
}
This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.
This issue has been closed because it has not received response for too long time. You could reopen it if you encountered similar problems in the future.
Search before asking
Description
Hi All Following to the short discussion I create this issue.
https://github.com/apache/seatunnel/discussions/7746
So there is an idea and goal to source and sink completely full postgres (or etc) to another postgres (source) with data masking or generation fake data for sensitive attributes. Good to know that there are a lot of fakesource available with random generators but at this moment I don't know is it working in transformer or not. Also some good news that there is dynamic compilation available for some completely custom cases.
What do you think?
Usage Scenario
Some maybe will try to use Transformer in case of masking and fake generation. The real case is to make data synchronization from prod to test environment with some predefined option by user request
Related issues
Supporting fake data generation in transformer
Are you willing to submit a PR?
Code of Conduct