Open alston111111 opened 4 years ago
Thanks for filing the issue. The read filtering functionality is meant present a consistent view of row data across indexes in a table with TTL enabled. If it's disabled, then inconsistencies are expected to show up. One could argue this type of setting shouldn't be made available because of the consistency issues it causes, but it has been useful in debugging issues. It's not meant to be disabled for queries that require consistent views on the indexes.
Thanks for your reply. Yes, I agree with the recommend. We disabled TTL read filtering for temporarily fixing a problem, because we met a problem that was possibly relevant to the issue #1024 .
With TTL read filtering enabled, executing "INSERT ON DUPLICATE KEY UPDATE" would return "Can't find record" error if there is a conflict unique key value which was TTL expired but not removed by compactions yet.
Should I report it in detail as a new issue and relate it to #1024 , or just transfer to issue #1024 and make comments ?
Referencing issue #1024 is sufficient for the TTL bug. I believe we understand issue at hand, that filtering checks is not applied during ha_rocksdb::check_and_lock_sk(). It should be applied in a manner similar to ha_rocksdb::check_and_lock_unique_pk(), but unfortunately, we have not been able to make any progress on fixing this yet.
OK. Thanks!
With the following table:
Keep inserting into the table and the table will eventually get corrupted in that some of the index records of an TTL-expired row disappeard due to compactions and the row is inconsistent.
case:
The following configuration items are set to speed up the compaction frequency and thus increase the reproducing probability:
An Example after several minites:
Possible Reasons:
It can be deduced from MyRocks record format and rocksdb file orginization that
(1) Records of the same index in a rocksdb table are clustered together in each LSM level except L0, while they might be spanning multiple sst files;
(2) The different index records of a mysql row (of a rocksdb table) are mostly scattered, may be in different sst files and even at different levels in the default LSM;
ref: some source code in compaction_picker.cc and https://github.com/facebook/rocksdb/wiki/Compaction
According to the above reasons, the index records of the same row might have a chance to get partly involved in one compactoin and if they are TTL expired, the row (in SQL/storage engine layer) would be inconsistent.
Note: this, I think, might come from two more deeper reasons: (this is just my point of view)
Compactions are taken inside rocksdb who has no idea about the constraints in myrocks-handler layer or SQL layer. This is different from InnoDB (storage engine layer as a whole).
That TTL read filtering = ON and TTL timestamp stored in an entry give the way of marking an entry as "can-be-purged" (can be compared with delete-marked records or useless old-version records in InnoDB). When TTL read filtering is set OFF, the can-be-purged state of an index record is not clear in the reasonable state transition: (entry) being-used (in-trx) => not-used (in-trx) => can-be-purged (not visible any more) => purged.
So, I met this case and just report it here. (don't know if it's appropriate :) )
Referenced Image: