aws-samples / cql-replicator

CQLReplicator is a migration tool that helps you to replicate data from Cassandra to AWS Services
Apache License 2.0
16 stars 8 forks source link

nested_object conversion issue #165

Open dubrovine opened 1 month ago

dubrovine commented 1 month ago

https://github.com/aws-samples/cql-replicator/blob/24cc53f1b8c80c81204165cf4dbb55c1915920fc/glue/sbin/keyspaces/CQLReplicator.scala#L174

if we have such scheme in aws keyspaces

CREATE TABLE channel.channels (
    id timeuuid PRIMARY KEY,
    alias text,
    company_id text,
    created timestamp,
    description text,
    name text,
    photo text,
    sessiondata text,
    sessionid timeuuid,
    sessionlastid timeuuid,
    state text,
    type text,
    updated timestamp,
    user frozen<user>,
    users map<text, text>
)

users map<text, text> generating nested_object like (when we exporting it from cassandra table):

"users":"{\"g.Wngn-ynR5hl-AAAC\":{\"id\":\"g.Wngn-ynR5hl-AAAC\",\"name\":null,\"photo\":null,\"type\":\"g\",\"t\":null}}"

there is no change to store string like this so processed string should looks like:

users":"{\"g.Wngn-ynR5hl-AAAC\":\"{\\\"id\\\":\\\"g.Wngn-ynR5hl-AAAC\\\",\\\"name\\\":null,\\\"photo\\\":null,\\\"type\\\":\\\"g\\\",\\\"t\\\":null}\"}"

nwheeler81 commented 1 month ago

Hi @dubrovine, I tried to replicate your issue, here is my result: Source: image Target: image Both your examples replicated "as is" with no changes, no extra slashes. Cloud you please provide more context on your issue.

dubrovine commented 1 month ago

Sorry. I forgot to mention that we are trying to replicate from UDT

here is the original cassandra scheme:

`CREATE TABLE channel.channels (
    id timeuuid PRIMARY KEY,
    alias text,
    company_id text,
    created timestamp,
    description text,
    name text,
    photo text,
    sessiondata text,
    sessionid timeuuid,
    sessionlastid timeuuid,
    state text,
    type text,
    updated timestamp,
    user frozen<user>,
    users map<text, frozen<user>>
)`

users map<text, frozen<user>>

here is keyspaces scheme:

`CREATE TABLE channel.channels (
    id timeuuid PRIMARY KEY,
    alias text,
    company_id text,
    created timestamp,
    description text,
    name text,
    photo text,
    sessiondata text,
    sessionid timeuuid,
    sessionlastid timeuuid,
    state text,
    type text,
    updated timestamp,
    user text,
    users map<text, text>
)`

users map<text, text>

so we trying to avoid UDTs and using text (json string instead)

for example we exports one record from cassandra & receiving such query

INSERT INTO channel.channels JSON '{"id":"c6f05420-0f1c-11e8-8ca4-eff552a6d2cd","alias":null,"company_id":"i88430253129","created":"2018-02-11 11:14:50.596Z","description":null,"name":"test","photo":null,"sessiondata":null,"sessionid":null,"sessionlastid":null,"state":null,"type":"help","updated":null,"user":"{\"id\":\"g.Wngn-ynR5hl-AAAC\",\"name\":null,\"photo\":null,\"type\":\"g\",\"t\":null}","users":"{\"g.Wngn-ynR5hl-AAAC\":{\"id\":\"g.Wngn-ynR5hl-AAAC\",\"name\":null,\"photo\":null,\"type\":\"g\",\"t\":null}}"}'

I am trying to insert directly into keyspaces table & got error

Снимок экрана 2024-10-04 в 08 05 45

p:s might it would be better to delete params with null values (it might decrease disc usage)

nwheeler81 commented 1 month ago

@dubrovine I think, the problem is in the structure map<text, UDT> that has V defined by the frozen UDT. Currently, the CQLReplicator can transform only non-nested UDTs, for example, the user: frozen column maps to user: text in the target table. Let me check if there is quick workarounds.

nwheeler81 commented 1 month ago

@dubrovine you are missing double quotes around UDT inside MAP: "users":"{\"g.Wngn-ynR5hl-AAAC\":"{\"id\":\"g.Wngn-ynR5hl-AAAC\",\"name\":null,\"photo\":null,\"type\":\"g\",\"t\":null}"}".

dubrovine commented 1 month ago

yeap. you're right (I was mentioned it in the first comment) is it possible to add double quotes in CQLReplicator.scala ?

nwheeler81 commented 1 month ago

@dubrovine I've added another support function to transform collections with UDT to collections with TEXT (MAP<Type, UDT> to MAP<Type, Text>):

cqlreplicator --state run --safe-mode-disabled --landing-zone s3://cql-replicator-1234567890-us-east-1-demo --region us-east-1 --tiles 1 --worker-type G.025X --src-keyspace issues --src-table channels --trg-keyspace issues --trg-table channels --orw 8000000 --env demo --json-mapping '{"keyspaces": {"udtConversion": { "enabled": true,"columns":["user"]}}}' --writetime-column updated

image Limitations: Supports only Map structure. Pleas use this branch to test it in your environment. Please, keep me posted after your tests I will merge it.

dubrovine commented 2 weeks ago

hello there. to be honest we didn't have time to check everything carefully since we received this news AWS KEYSPACES UDT SUPPORT

actually THIS seems like fix everything and we don't need to make any conversion