TileDB-Inc / TileDB-Py

Python interface to the TileDB storage engine
MIT License
190 stars 33 forks source link

Fix randomly failing test `DenseArrayTest::test_open_with_timestamp[False]` #2090

Closed kounelisagis closed 1 month ago

kounelisagis commented 1 month ago

Historically, we had several tests that overrode the timestamp for writes to avoid test failures when multiple writes occurred within the same millisecond. By leveraging https://github.com/TileDB-Inc/TileDB/pull/4800, we modified these tests in https://github.com/TileDB-Inc/TileDB-Py/pull/1953 to run both with and without these overrides.

DenseArrayTest::test_open_with_timestamp[False] has frequently failed, locally, on the TileDB-Py CI, and the centralized-tiledb-nightlies.

It turns out that the logic of this test was not entirely correct. The test writes three fragments, one after another, and then opens each fragment using its corresponding timestamp to verify the correctness of its contents. The mistake was in how the timestamps for the reads were collected. After opening the array for writing and writing the new content, the array was reopened solely to retrieve the timestamp range.

Opening the array in read mode immediately after closing it from write mode isn't expected to give the same timestamp range. In fact, since it depends on the current time, it will return a slightly larger end value. This process was repeated three times in that test.

For reference, without using any sleeps, if we compare the received timestamp end with the actual timestamp end found in the fragment name on disk, we anticipate something like the following table:

read timestamp end actual timestamp end
1 1728909483655 1728909483653
2 1728909483661 1728909483658
3 1728909483665 1728909483661

The problem is that, for example, when trying to read the second fragment using 1728909483661, the actual timestamp of the fragment is 1728909483658. This means that there is a change to read the third fragment instead of the second. Adding sleep calls in https://github.com/TileDB-Inc/TileDB-Py/pull/92 makes the problem less frequent, but it still exists.

This PR solves the problem using FragmentInfoList to get the actual fragment timestamps.

Kudos to @ypatia, @teo-tsirpanis, and @ktsitsi for helping me investigate this!

Should fix the related error in https://github.com/TileDB-Inc/centralized-tiledb-nightlies/issues/24 cc @jdblischak


[sc-56405]