equinor / fmu-sumo

Interaction with Sumo in the FMU context
https://fmu-sumo.readthedocs.io/en/latest/
Apache License 2.0
0 stars 6 forks source link

Output from .filter has a random component #320

Open perolavsvendsen opened 5 months ago

perolavsvendsen commented 5 months ago

Example code:

mycase = sumo.get_case_by_uuid("2e711eae-e4bc-4c7a-9dbf-2e83c738b805")

mysurf = mycase.surfaces.filter(aggregation=True)[0]
print(mysurf.uuid)

mysurf = mycase.surfaces.filter(aggregation=True)[0]
print(mysurf.uuid)

This will sometimes print the same uuid, sometimes not ⚠️

This could potentially be very confusing for end users, as the code is not reproducing.

roywilly commented 5 months ago

If you use the keep_alive parameter (which triggers use of PIT), you will get consistent results. Documented here: https://fmu-sumo.readthedocs.io/en/latest/explorer.html#pagination-iterating-over-large-resultsets

sumo = Explorer(env="prod", keep_alive="5m")

I tested this several times for your example data and got consistent results every time keep_alive/PIT was used. When keep_alive/PIT was not used, I also saw the variability.

roywilly commented 4 months ago

Propose to add this issue as input to the general discussion of how fmu-sumo should best utilize the elastic-search PIT (Point-In-Time).

fmu-sumo PIT issue: https://github.com/equinor/fmu-sumo/issues/254

roywilly commented 4 months ago

Input from Raymond to look at 'sort': found that fmu-sumo already uses sort in queries (https://github.com/equinor/fmu-sumo/blob/7d44fee30abfb7e6e250a445f6ed41dd5e2fd14c/src/fmu/sumo/explorer/objects/_document_collection.py#L169)

Raymond proposed to try in TEST env which is very little in use: Got same uuid back on every query: this indicates that it is index changes that are affecting the query result variation.

Solution is likely to use PIT. Closing this issue, as this should be solved by the fmu-sumo PIT issue https://github.com/equinor/fmu-sumo/issues/254

perolavsvendsen commented 4 months ago

PIT may solve this for interactive use, but if a user runs a script on a Monday, and then runs the same script on Friday, the results will be different. This is just the way it is, but we need to find a way to communicate this clearly to avoid confusion. It is not intuitive to most, I suspect. Also, it is very likely that the PIT solution will spread and people will start setting very long PIT on every single query they run. I guess this will be costly?