equinor / fmu-sumo

Interaction with Sumo in the FMU context
https://fmu-sumo.readthedocs.io/en/latest/
Apache License 2.0
0 stars 6 forks source link

Recommended `async` compatible way of iterating over document collection #228

Closed anders-kiaer closed 9 months ago

anders-kiaer commented 9 months ago

We have some code snippets with patterns like:

def some_function():
    surface_collection: SurfaceCollection = case.surfaces.filter(...)

    for surf in surface_collection:
        ...

We now want to make some_function async, and then at the same time ensure there are no sync. calls done when interacting with the surface collection. What would be the suggested pattern to use in order to iterate over the DocumentCollection in async functions?


Some possibilities might include:

1) This uses functions that already exists I think, but feels a bit clunky:

N = await surface_collection.length_async()
for i in range(N):
    surf = await surface_collection.getitem_async(i)
    ...

2) In our scenarios, we typically have filtered down so much that we know the size of the document collection is always manageable to collect with one post request. Maybe something like this could work:

await surface_collection.fetch_all_documents()
for surf in surface_collection:  # All documents already loaded, this is not I/O blocking anymore.
    ...

3) I think the reason why we can iterate over DocumentCollections today is that __getitem__ is defined. This is however apparently a legacy Python way of providing iteration (https://stackoverflow.com/a/20551346). Could define __iter__ and __aiter__ such that both sync. and async. iterations is possible (https://stackoverflow.com/a/75376959)? I.e. for surf in surface_collection: and async for surf in surface_collection:

adnejacobsen commented 9 months ago

I think option 3 sounds good, enabling the async for .. in .. syntax looks like the most user friendly solution. Will look into __iter__ and __aiter__ 👍