Open perolavsvendsen opened 5 months ago
If you use the keep_alive parameter (which triggers use of PIT), you will get consistent results. Documented here: https://fmu-sumo.readthedocs.io/en/latest/explorer.html#pagination-iterating-over-large-resultsets
sumo = Explorer(env="prod", keep_alive="5m")
I tested this several times for your example data and got consistent results every time keep_alive/PIT was used. When keep_alive/PIT was not used, I also saw the variability.
Propose to add this issue as input to the general discussion of how fmu-sumo should best utilize the elastic-search PIT (Point-In-Time).
fmu-sumo PIT issue: https://github.com/equinor/fmu-sumo/issues/254
Input from Raymond to look at 'sort': found that fmu-sumo already uses sort in queries (https://github.com/equinor/fmu-sumo/blob/7d44fee30abfb7e6e250a445f6ed41dd5e2fd14c/src/fmu/sumo/explorer/objects/_document_collection.py#L169)
Raymond proposed to try in TEST env which is very little in use: Got same uuid back on every query: this indicates that it is index changes that are affecting the query result variation.
Solution is likely to use PIT. Closing this issue, as this should be solved by the fmu-sumo PIT issue https://github.com/equinor/fmu-sumo/issues/254
PIT may solve this for interactive use, but if a user runs a script on a Monday, and then runs the same script on Friday, the results will be different. This is just the way it is, but we need to find a way to communicate this clearly to avoid confusion. It is not intuitive to most, I suspect. Also, it is very likely that the PIT solution will spread and people will start setting very long PIT on every single query they run. I guess this will be costly?
Example code:
This will sometimes print the same
uuid
, sometimes not ⚠️This could potentially be very confusing for end users, as the code is not reproducing.