It turns out that the logic of this test was not entirely correct. The test writes three fragments, one after another, and then opens each fragment using its corresponding timestamp to verify the correctness of its contents. The mistake was in how the timestamps for the reads were collected. After opening the array for writing and writing the new content, the array was reopened solely to retrieve the timestamp range.
Opening the array in read mode immediately after closing it from write mode isn't expected to give the same timestamp range. In fact, since it depends on the current time, it will return a slightly larger end value. This process was repeated three times in that test.
For reference, without using any sleeps, if we compare the received timestamp end with the actual timestamp end found in the fragment name on disk, we anticipate something like the following table:
read timestamp end
actual timestamp end
1
1728909483655
1728909483653
2
1728909483661
1728909483658
3
1728909483665
1728909483661
The problem is that, for example, when trying to read the second fragment using 1728909483661, the actual timestamp of the fragment is 1728909483658. This means that there is a change to read the third fragment instead of the second. Adding sleep calls in https://github.com/TileDB-Inc/TileDB-Py/pull/92 makes the problem less frequent, but it still exists.
This PR solves the problem using FragmentInfoList to get the actual fragment timestamps.
Kudos to @ypatia, @teo-tsirpanis, and @ktsitsi for helping me investigate this!
Historically, we had several tests that overrode the timestamp for writes to avoid test failures when multiple writes occurred within the same millisecond. By leveraging https://github.com/TileDB-Inc/TileDB/pull/4800, we modified these tests in https://github.com/TileDB-Inc/TileDB-Py/pull/1953 to run both with and without these overrides.
DenseArrayTest::test_open_with_timestamp[False]
has frequently failed, locally, on the TileDB-Py CI, and the centralized-tiledb-nightlies.It turns out that the logic of this test was not entirely correct. The test writes three fragments, one after another, and then opens each fragment using its corresponding timestamp to verify the correctness of its contents. The mistake was in how the timestamps for the reads were collected. After opening the array for writing and writing the new content, the array was reopened solely to retrieve the timestamp range.
Opening the array in read mode immediately after closing it from write mode isn't expected to give the same timestamp range. In fact, since it depends on the current time, it will return a slightly larger end value. This process was repeated three times in that test.
For reference, without using any sleeps, if we compare the received timestamp end with the actual timestamp end found in the fragment name on disk, we anticipate something like the following table:
The problem is that, for example, when trying to read the second fragment using
1728909483661
, the actual timestamp of the fragment is1728909483658
. This means that there is a change to read the third fragment instead of the second. Adding sleep calls in https://github.com/TileDB-Inc/TileDB-Py/pull/92 makes the problem less frequent, but it still exists.This PR solves the problem using
FragmentInfoList
to get the actual fragment timestamps.Kudos to @ypatia, @teo-tsirpanis, and @ktsitsi for helping me investigate this!
Should fix the related error in https://github.com/TileDB-Inc/centralized-tiledb-nightlies/issues/24 cc @jdblischak
[sc-56405]