DataBiosphere / azul

Metadata indexer and query service used for AnVIL, HCA, LungMAP, and CGP
Apache License 2.0
6 stars 2 forks source link

Document order of replicas is nondeterministic #6442

Open nadove-ucsc opened 1 month ago

nadove-ucsc commented 1 month ago

While working on https://github.com/DataBiosphere/azul/issues/6122, we had difficulty asserting the contents of the verbatim PFB manifest due to inconsistencies in the ordering of the replicas. The order appeared stable on a personal deployment, but changed when pushing to GitHub. Our investigation revealed that the shard count affected the order of the replicas in the index, but patching the shard count to a consistent value (1) did not result in a consistent order.

Currently, our workaround is to sort the manifests before comparing the expected and observed values.

nadove-ucsc commented 1 month ago

It remains undetermined whether the inconsistency is due to the order in which the replicas are written to the index, or whether it arises when reading them from the index.

dsotirho-ucsc commented 1 month ago

Assignee to consider next steps.

hannes-ucsc commented 1 week ago

We don't currently know what causes the non-determinism. We should probably ask Elastic or the Elasticsearch community.