Open sync-by-unito[bot] opened 1 year ago
➤ Alice Lottini commented:
As per Biyou's suggestion, these options can be used to avoid importing tombstones:
./bin/dsbulk load h ➤ Yuki Morishita commented: I tested with the nullString and nullValue options but it didn't work.
I think the problem is that writetime for null value is empty on CSV, and when loading, timestamp is not set or maybe the current timestamp is used and the null value is inserted with the recent timestamp. ➤ Wei Deng commented: Yuki Morishita is this still an issue? ➤ Yuki Morishita commented: Wei Deng I believe so, this is the fundamental problem of CSV import of deleted records. It can happen in dsbulk 1.8+ with preserve timestamp option. {quote}I think the reason this happens is that we have null in col2 column, and the way cloud migrator imports with INSERT INTO ks.test (pk, cl, col2) VALUES (:pk, :cl, :col2) USING TIMESTAMP :col2_writetime AND TTL :col2_ttl;. The col2 column will be inserted but there is no col2_writetime output in CSV, this will create the null column with the timestamp higher than DELETE.{quote} -connector.csv.nullValue ""
This may relates to #23, but I got the case where the migrator resurrected the deleted row.
The customer already deleted the exported CSVs so below is the hypothesis, but I confirmed the case:
The migrator command line options used
Steps to reproduce
null
value:(
CREATE TABLE test with (pk uuid, cl text, col1 decimal, col2 text, PRIMARY KEY (pk, cl))
);The last
,,,
indicates thatcol2
column isnull
and thus has no writetime nor ttl.SELECT * FROM test
should not return the row, because we deleted it.SELECT * FROM test
again, you will see the row again with regular columns with null, and only having primary key columns.I think the reason this happens is that we have null in
col2
column, and the way cloud migrator imports withINSERT INTO ks.test (pk, cl, col2) VALUES (:pk, :cl, :col2) USING TIMESTAMP :col2_writetime AND TTL :col2_ttl;
. Thecol2
column will be inserted but there is nocol2_writetime
output in CSV, this will create the null column with the timestamp higher thanDELETE
.┆Issue is synchronized with this Jira Task by Unito ┆Components: Schema Migrator ┆Priority: Major