Closed GeraldIr closed 2 months ago
Do you have an example of these different result nodes? Maybe the check can also be improved? We are for instance thinking about a save_collection process, that could also have this problem.
From: Gerald Walter Irsiegler @.> Sent: Monday, December 4, 2023 11:44 AM To: Open-EO/openeo-python-client @.> Cc: Subscribed @.***> Subject: [Open-EO/openeo-python-client] Allow '_ensure_save_result' to be skipped with a flag in any place it is called. (Issue #513)
Often I find myself being unable to start jobs that have different result nodes which are incompatible with save_result and there has to be some workaround implemented for handling these edge cases.
I'd like to propose an optional flag for methods such as create_job and download which allows ensure save result to be skipped so that no superfluous save_result node gets added.
I can also create a PR for this if a change like this is approved.
— Reply to this email directly, view it on GitHubhttps://github.com/Open-EO/openeo-python-client/issues/513, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABNJPSH5Q75CQFPLCIQLMK3YHWSQTAVCNFSM6AAAAABAFYEIOOVHI2DSMVQWIX3LMV43ASLTON2WKOZSGAZDGNJVGAYDCMI. You are receiving this because you are subscribed to this thread.Message ID: @.***>
VITO Disclaimer: http://www.vito.be/e-maildisclaimer
We are currently using save_ml_model which stores results of random forest regression, we debated including this functionality in save_result but decided the outputs were different enough to justify its own process (and this might in the future be the case for more things, like UDFs).
I think for more advanced users having the option would be a benefit, and making it optional would save the users relying on it currently from any trouble.
is save_ml_model
the only use case you're talking about? Because we can easily whitelist that process alongside save_result
in the logic.
ah no wait, save_ml_model
is already whitelisted in MlModel.create_job()
so I'm not sure I fully understand the problem here.
can you provide some examples in code snippets?
I would like to skip the save_result
for some of the sen2like
processing and now created a PR to do so: https://github.com/Open-EO/openeo-python-client/pull/627 Not sure, if I miss something, I could not find a whitelist for save_ml_model
to use it as an example.
Side note: if you do connection.create_job(your_cube.flat_graph())
, no save_result will be added. This is the easy way around.
A better way forward would be to fetch the schema for the last process, and only add save_result
if the output is a datacube.
Now the thing is, sen2like claims the output is a datacube, so as a user, I would expect that I actually need to add a save_result
process, otherwise I won't get valid output. Also the process description does not mention anything in this regards.
@ValentinaHutter Is that process metadata correct?
Thanks for pointing this out, then I will use the create job with the flat graph for now! The reason for this is that sen2like processor natively generates a Sentinel2 like .SAFE folder holding all the results. In order to make use of other openeo processes on top of that, we then load the data from the .SAFE afterwards. So, when we only want to generate the .SAFE output, we would not need a save_result process afterwards. This is why the return value of sen2like is a datacube. This is described in the process metadata.
In order to make use of other openeo processes on top of that, we then load the data from the .SAFE afterwards. So, when we only want to generate the .SAFE output, we would not need a save_result process afterwards.
I'm not completely following: if you don't have a save_result in your first job, you don't really have result assets in your job results so how can you reference to that cube in another job?
Isn't it better to define a "SAFE" file format in your capabilities to be used in this case?
I could not find a whitelist for save_ml_model to use it as an example.
FYI: I was referring to this part in create_job
of the MlModel
class:
added auto_add_save_result
option to download/create_job/execute_batch with cfd9f1a91f15de9ba98b894dd7f038f656232fdc
Often I find myself being unable to start jobs that have different result nodes which are incompatible with save_result and there has to be some workaround implemented for handling these edge cases.
I'd like to propose an optional flag for methods such as create_job and download which allows ensure save result to be skipped so that no superfluous save_result node gets added.
I can also create a PR for this if a change like this is approved.