Open-EO / openeo-python-client

Python client API for OpenEO
https://open-eo.github.io/openeo-python-client/
Apache License 2.0
156 stars 42 forks source link

Allow '_ensure_save_result' to be skipped with a flag in any place it is called. #513

Closed GeraldIr closed 2 months ago

GeraldIr commented 12 months ago

Often I find myself being unable to start jobs that have different result nodes which are incompatible with save_result and there has to be some workaround implemented for handling these edge cases.

I'd like to propose an optional flag for methods such as create_job and download which allows ensure save result to be skipped so that no superfluous save_result node gets added.

I can also create a PR for this if a change like this is approved.

jdries commented 12 months ago

Do you have an example of these different result nodes? Maybe the check can also be improved? We are for instance thinking about a save_collection process, that could also have this problem.


From: Gerald Walter Irsiegler @.> Sent: Monday, December 4, 2023 11:44 AM To: Open-EO/openeo-python-client @.> Cc: Subscribed @.***> Subject: [Open-EO/openeo-python-client] Allow '_ensure_save_result' to be skipped with a flag in any place it is called. (Issue #513)

Often I find myself being unable to start jobs that have different result nodes which are incompatible with save_result and there has to be some workaround implemented for handling these edge cases.

I'd like to propose an optional flag for methods such as create_job and download which allows ensure save result to be skipped so that no superfluous save_result node gets added.

I can also create a PR for this if a change like this is approved.

— Reply to this email directly, view it on GitHubhttps://github.com/Open-EO/openeo-python-client/issues/513, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABNJPSH5Q75CQFPLCIQLMK3YHWSQTAVCNFSM6AAAAABAFYEIOOVHI2DSMVQWIX3LMV43ASLTON2WKOZSGAZDGNJVGAYDCMI. You are receiving this because you are subscribed to this thread.Message ID: @.***>

VITO Disclaimer: http://www.vito.be/e-maildisclaimer

GeraldIr commented 12 months ago

We are currently using save_ml_model which stores results of random forest regression, we debated including this functionality in save_result but decided the outputs were different enough to justify its own process (and this might in the future be the case for more things, like UDFs).

I think for more advanced users having the option would be a benefit, and making it optional would save the users relying on it currently from any trouble.

soxofaan commented 11 months ago

is save_ml_model the only use case you're talking about? Because we can easily whitelist that process alongside save_result in the logic.

soxofaan commented 11 months ago

ah no wait, save_ml_model is already whitelisted in MlModel.create_job() so I'm not sure I fully understand the problem here.

can you provide some examples in code snippets?

ValentinaHutter commented 2 months ago

I would like to skip the save_result for some of the sen2like processing and now created a PR to do so: https://github.com/Open-EO/openeo-python-client/pull/627 Not sure, if I miss something, I could not find a whitelist for save_ml_model to use it as an example.

jdries commented 2 months ago

Side note: if you do connection.create_job(your_cube.flat_graph()), no save_result will be added. This is the easy way around. A better way forward would be to fetch the schema for the last process, and only add save_resultif the output is a datacube. Now the thing is, sen2like claims the output is a datacube, so as a user, I would expect that I actually need to add a save_resultprocess, otherwise I won't get valid output. Also the process description does not mention anything in this regards. @ValentinaHutter Is that process metadata correct?

ValentinaHutter commented 2 months ago

Thanks for pointing this out, then I will use the create job with the flat graph for now! The reason for this is that sen2like processor natively generates a Sentinel2 like .SAFE folder holding all the results. In order to make use of other openeo processes on top of that, we then load the data from the .SAFE afterwards. So, when we only want to generate the .SAFE output, we would not need a save_result process afterwards. This is why the return value of sen2like is a datacube. This is described in the process metadata.

soxofaan commented 2 months ago

In order to make use of other openeo processes on top of that, we then load the data from the .SAFE afterwards. So, when we only want to generate the .SAFE output, we would not need a save_result process afterwards.

I'm not completely following: if you don't have a save_result in your first job, you don't really have result assets in your job results so how can you reference to that cube in another job?

Isn't it better to define a "SAFE" file format in your capabilities to be used in this case?

soxofaan commented 2 months ago

I could not find a whitelist for save_ml_model to use it as an example.

FYI: I was referring to this part in create_job of the MlModel class:

https://github.com/Open-EO/openeo-python-client/blob/26bef79f212c1f9157420dd265dabe2d30f9a92a/openeo/rest/mlmodel.py#L108-L110

soxofaan commented 2 months ago

added auto_add_save_result option to download/create_job/execute_batch with cfd9f1a91f15de9ba98b894dd7f038f656232fdc