Closed grigi closed 4 months ago
Hi @grigi, thanks for the PR!
Could you work around this by overriding the schema to something like:
plugins:
extractors:
- name: tap-example
schema:
my_stream:
my_uuid_column:
type: ["string", "null"]
format: uuid
The UUID format is already handled:
nb This is if you're using Meltano, otherwise you could pass a custom catalog file.
Thanks for the rapid response. I've tried that (using meltano, yes) but I can't get it to work.
It updates the schema generated in the catalog cache .meltano/run/tap-mysql/tap.properties.json
, but when I invoke the extractor it overrides the schema in the output.
Manually updating the discovered catalog and providing it, it also gets ignored. I'm not sure if this is a bug in meltano or pipelinewise-tap-mysql.
There's a high chance that I'm missing something :shrug:
I think the change here can be problematic for the general use case, since a non-uuid text value can really break pipelines if the target column is UUID.
That said, we could move the add a new user setting to check right before https://github.com/MeltanoLabs/target-postgres/blob/fd881a638fd39649f570ee9d24e6db3531b741e2/target_postgres/connector.py#L431-L438
if we should even attempt to adapt column types to the schema. That way, we'd skip this problem entirely if the user considers mismatches are acceptable.
Something like check_column_types
or similar, with a description like Fail the sync if the incoming stream schema doesn't match the type of an existing column
.
Wdyt?
cc @visch
That said, the fact that the tap doesn't support schema overrides is bad so I dug a bit and submitted a PR https://github.com/transferwise/pipelinewise-tap-mysql/pull/186.
I did test it a bit but let's hope I didn't do something terrible and the Wise folks accept it 😅.
In the meantime it can be tested by updating the pip_url
. This is the project I validated these changes with:
plugins:
extractors:
- name: tap-mysql
variant: transferwise
pip_url: "'pipelinewise-tap-mysql @ git+https://github.com/edgarrmondragon/pipelinewise-tap-mysql.git@allow-schema-overrides'"
config:
user: root
host: 'localhost'
database: mysql
filter_dbs: mysql
select:
- "mysql-tests.*"
schema:
mysql-tests:
my_uuid:
type: [string, "null"]
format: uuid
inclusion: available
Ooh thanks for that. That PR makes it all work for me :-) It's certainly the more correct solution.
I think the change here can be problematic for the general use case, since a non-uuid text value can really break pipelines if the target column is UUID.
That said, we could move the add a new user setting to check right before
if we should even attempt to adapt column types to the schema. That way, we'd skip this problem entirely if the user considers mismatches are acceptable.
Something like
check_column_types
or similar, with a description likeFail the sync if the incoming stream schema doesn't match the type of an existing column
.Wdyt?
cc @visch
That said, the fact that the tap doesn't support schema overrides is bad so I dug a bit and submitted a PR transferwise/pipelinewise-tap-mysql#186.
I did test it a bit but let's hope I didn't do something terrible and the Wise folks accept it 😅.
In the meantime it can be tested by updating the
pip_url
. This is the project I validated these changes with:plugins: extractors: - name: tap-mysql variant: transferwise pip_url: "'pipelinewise-tap-mysql @ git+https://github.com/edgarrmondragon/pipelinewise-tap-mysql.git@allow-schema-overrides'" config: user: root host: 'localhost' database: mysql filter_dbs: mysql select: - "mysql-tests.*" schema: mysql-tests: my_uuid: type: [string, "null"] format: uuid inclusion: available
That seems like a reasonable setting idea! Gives folks a way to allow postgres to just figure it out for you which will work in a lot of cases.
I think there's a whole flow with the other targets around different data types https://www.stitchdata.com/docs/replication/loading/understanding-table-structural-changes#columns-mixed-data-types that we haven't gone down but we could.
I kind of like the current method as it's explicit and instead of making columns you don't expect it just fails and then you have to create a solution (we could do a much better job explaining what's going on when it fails for this reason) but it all depends on the use case people have.
I see your PR got closed without comment. That's sad. But thanks for it. I'm going to keep on using your branch until we decommission our last MySQL servers. (Hopefully by end of the year).
So, considering that, I feel that this PR can be closed.
Thanks for your assistance!
Have some data that is of type UUID where the source treats it as text.
Instead of failing with an exception:
Just treat them as compatible.
I couldn't think of a different way to do this, as even though the Postgres integration supports UUID, not all do. And technically UUID isn't TEXT (just that it's represented as text in sql, so the DB handles it transparently), so it shouldn't be part of
merge_sql_types
.