crate / commons-codec

Data decoding, encoding, conversion, and translation utilities.
https://commons-codec.readthedocs.io
GNU Lesser General Public License v3.0
2 stars 2 forks source link

Observations from DMS Replication tests #11

Open hammerhead opened 3 months ago

hammerhead commented 3 months ago

While testing out the DMS transformation, I made a few observations I want to share. Each issue still needs to be specified further with proper steps to reproduce, this is more of a brain dump meta collection. All observations were made on a PostgreSQL RDS source.

DELETE without primary key

Issuing a DELETE FROM my_table (aka truncate) on the source fails to replicate with Unable to invoke DML operation without primary key information. Currently, a WHERE clause with the primary key is required:

https://github.com/daq-tools/commons-codec/blob/e9f17e90bd4ad044cb6e4d942868dd868af26e1b/src/commons_codec/transform/aws_dms.py#L88

ALTER TABLE ADD COLUMN

I tried an ALTER TABLE my_table ADD COLUMN last_updated TIMESTAMP DEFAULT, which was ignored as being an unsupported operation. Subsequent UPDATE my_table SET last_updated = NOW() WHERE id = 1; therefore also fail. What can be a strategy to handle such situations? Do we need to manually migrate the CrateDB table structure?

Similarly, if my PostgreSQL table looks like this:

CREATE TABLE public.locations (
   device_id TEXT NOT NULL PRIMARY KEY,
   location TEXT,
   attributes JSONB
);

I also cannot have the addition of a new JSON property replicated (again no primary key): UPDATE locations SET attributes['last_updated'] = TO_JSONB(NOW());

Data types within OBJECT

If there is a JSON in PostgreSQL with a timestamp, such as {"last_updated": "2024-08-09T11:54:36.815606Z"}, the OBJECT in CrateDB recognizes last_updated as TEXT. Since there is no type information in the JSON, there is probably nothing the CDC translator can do. For real-world use cases, we need to think of some strategy (run ALTER TABLE ADD COLUMN in CrateDB upfront?).

amotl commented 3 months ago

Hi, and thanks for your reports. I think you explained a few cases very well where it becomes clear that the current implementation just provides fundamental mappings of basic INSERT|UPDATE|DELETE statements. We will need to check how corresponding DMS events look like, and whether we can apply their semantics to CrateDB, and how.

amotl commented 2 months ago

In general, Debezium/PostgreSQL also do not support DDL operations / schema changes?

The [PostgreSQL] connector relies on and reflects the PostgreSQL logical decoding feature, which has the following limitations:

  • Logical decoding does not support DDL changes. This means that the connector is unable to report DDL change events back to consumers.
  • ...

-- https://debezium.io/documentation/reference/connectors/postgresql.html