Closed frozensky closed 9 months ago
@frozensky
./cqlreplicator --state run --tiles 2 --landing-zone s3://cql-replicator-1234567890-us-east-1 --writetime-column modificationtime --region us-east-1 --src-keyspace quark1 --src-table personalcontentslists --trg-keyspace quark2 --trg-table personalcontentslists
:
quark1.personalcontentslists
:
quark2.personalcontentslists
I was not able to reproduce it.Options to validate:
Maybe sparkSession.read
trying to read an empty s3/prefix
and schema missing the columns.
Options to try:
--safe-mode-disabled
when use --state run
to enable MEMORY_AND_DISK_SER
caching strategyinferSchema
to false
in keysDiscoveryProcess
the issue: inconsistent state of the primary keys on S3
Describe the bug Rerun cqlreplicator to continue replication and discovery tile error claim a column does not exist.
To Reproduce Steps to reproduce the behavior:
Run command './cqlreplicator --state run --tiles 40 --writetime-column modificationtime --landing-zone s3://cqlrep-prd-1 --region us-west-1 --src-keyspace quark --src-table personalcontentslists --trg-keyspace quark --trg-table personalcontentslists --override-rows-per-worker 2000000 --inc-traffic'
table scheme
CREATE TABLE quark.personalcontentslists ( accountid text, uxrowid text, elementcount int, modelid text, modificationtime timestamp, sortedcontents text, sortedcontents_bucket_1 text, sortedcontents_bucket_2 text, sortedcontents_bucket_3 text, sortedcontents_bucket_4 text, sortedcontents_bucket_5 text, PRIMARY KEY (accountid, uxrowid) ) WITH CLUSTERING ORDER BY (uxrowid ASC) AND bloom_filter_fp_chance = 0.01 AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} AND comment = '' AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'} AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND crc_check_chance = 1.0 AND default_time_to_live = 0 AND gc_grace_seconds = 864000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND nodesync = {'enabled': 'true'} AND speculative_retry = '99PERCENTILE';