Open hammerhead opened 3 months ago
Hi, and thanks for your reports. I think you explained a few cases very well where it becomes clear that the current implementation just provides fundamental mappings of basic INSERT|UPDATE|DELETE statements. We will need to check how corresponding DMS events look like, and whether we can apply their semantics to CrateDB, and how.
In general, Debezium/PostgreSQL also do not support DDL operations / schema changes?
The [PostgreSQL] connector relies on and reflects the PostgreSQL logical decoding feature, which has the following limitations:
- Logical decoding does not support DDL changes. This means that the connector is unable to report DDL change events back to consumers.
- ...
-- https://debezium.io/documentation/reference/connectors/postgresql.html
While testing out the DMS transformation, I made a few observations I want to share. Each issue still needs to be specified further with proper steps to reproduce, this is more of a brain dump meta collection. All observations were made on a PostgreSQL RDS source.
DELETE without primary key
Issuing a
DELETE FROM my_table
(aka truncate) on the source fails to replicate withUnable to invoke DML operation without primary key information
. Currently, aWHERE
clause with the primary key is required:https://github.com/daq-tools/commons-codec/blob/e9f17e90bd4ad044cb6e4d942868dd868af26e1b/src/commons_codec/transform/aws_dms.py#L88
ALTER TABLE ADD COLUMN
I tried an
ALTER TABLE my_table ADD COLUMN last_updated TIMESTAMP DEFAULT
, which was ignored as being an unsupported operation. SubsequentUPDATE my_table SET last_updated = NOW() WHERE id = 1;
therefore also fail. What can be a strategy to handle such situations? Do we need to manually migrate the CrateDB table structure?Similarly, if my PostgreSQL table looks like this:
I also cannot have the addition of a new JSON property replicated (again no primary key):
UPDATE locations SET attributes['last_updated'] = TO_JSONB(NOW());
Data types within OBJECT
If there is a
JSON
in PostgreSQL with a timestamp, such as{"last_updated": "2024-08-09T11:54:36.815606Z"}
, theOBJECT
in CrateDB recognizeslast_updated
asTEXT
. Since there is no type information in the JSON, there is probably nothing the CDC translator can do. For real-world use cases, we need to think of some strategy (runALTER TABLE ADD COLUMN
in CrateDB upfront?).