Open jensenbox opened 4 months ago
I know you call them transformers but for some reason in my mind, they just seem closer to data generators than something that transforms :)
Hi! Thank your feedback. I will consider the naming, but both namings are controversial, so we need to choose the user-friendly. Maybe I will raise a vote in the future)
I named it transformer because some transformer changes original data rather than generate new ones. For instance, in the latest beta, you can generate random email and keeping part of email as was in original value:
- schema: "public"
name: "account"
transformers:
- name: "RandomEmail"
params:
column: "email"
engine: "hash"
keep_original_domain: true
local_part_template: "{{ first_name | lower }}.{{ last_name | lower }}.{{ .random_string | trunc 10 }}"
I can use RealAddress for the address_line_1 and random data for the others but it would be nice to have city be something interesting.
Good note, agree. I will try to make the RealAddress generate for useful according to your feedback.
Well I have an Idea when people can provide their own addresses or any other databases with data, for instance in json representation. The Greenmask would use that data for mapping to the columns. For instance.
- schema: "public"
name: "account_address"
transformers:
- name: "RandomDataFromFile"
params:
file: "/path/to/your/db.json"
columns:
- name: "address_line_1"
value: "{{ db.address_line1 }}"
- name: "city"
value: "{{ db.city }}"
And in the file might be kind of
[
{
"address_line_1": "val1",
"address_line_2": "val2",
"city": "val3",
"postal_code": "val4",
"region": "val5",
"country": "val6",
}
]
Why this way? I think this might be used not only for address but for multipurpose. Allowing users to define their own functional dependencies between attribute in the database provided.
I know you call them transformers but for some reason in my mind, they just seem closer to data generators than something that transforms :)
Hi! Thank your feedback. I will consider the naming, but both namings are controversial, so we need to choose the user-friendly. Maybe I will raise a vote in the future)
I named it transformer because some transformer changes original data rather than generate new ones. For instance, in the latest beta, you can generate random email and keeping part of email as was in original value:
- schema: "public" name: "account" transformers: - name: "RandomEmail" params: column: "email" engine: "hash" keep_original_domain: true local_part_template: "{{ first_name | lower }}.{{ last_name | lower }}.{{ .random_string | trunc 10 }}"
I was actually thinking the same thing when I wrote it - I see both sides for sure. There are data generators and data transformers (or mutators) - When I thought of how the documentation would be written it did not make sense to put them in two sections either - so there should be a good name for both of course.
I asked the AI God what it though:
I can use RealAddress for the address_line_1 and random data for the others but it would be nice to have city be something interesting.
Good note, agree. I will try to make the RealAddress generate for useful according to your feedback.
Well I have an Idea when people can provide their own addresses or any other databases with data, for instance in json representation. The Greenmask would use that data for mapping to the columns. For instance.
- schema: "public" name: "account_address" transformers: - name: "RandomDataFromFile" params: file: "/path/to/your/db.json" columns: - name: "address_line_1" value: "{{ db.address_line1 }}" - name: "city" value: "{{ db.city }}"
And in the file might be kind of
[ { "address_line_1": "val1", "address_line_2": "val2", "city": "val3", "postal_code": "val4", "region": "val5", "country": "val6", } ]
Why this way? I think this might be used not only for address but for multipurpose. Allowing users to define their own functional dependencies between attribute in the database provided.
For ease of use, you could even replace the file with a yaml array of values. They would of course have to evaluate down to strings but you could do this with yaml anchors so you could re-use it in other parts of the configuration file.
I know you call them transformers but for some reason in my mind they just seem closer to data generators than something that transforms :)
Anyway, I am working on a table that has the address broken up like:
I can use RealAddress for the address_line_1 and random data for the others but it would be nice to have
city
be something interesting.