load_ml_model id: overlap between batch job id and file path? - Githubissues

Open-EO / openeo-processes

Interoperable processes for openEO's big Earth observation cloud processing.

https://processes.openeo.org

Apache License 2.0

48 stars 15 forks source link

load_ml_model id: overlap between batch job id and file path? #384

Closed soxofaan closed 11 months ago

soxofaan commented 1 year ago

https://github.com/Open-EO/openeo-processes/blob/be148835a725966fbf1fff65215dbf2bca82a4a7/proposals/load_ml_model.json#L23-L33

Batch job id has regex "^[\\w\\-\\.~]+$" and user-uploaded file has regex "^[^\r\n\\:'\"]+$". Both regexes will match on simple things like foo-bar, so a back-end will not be able to determine if given id is intended as job id or a user uploaded file? Or is a back-end supposed to try the available options and pick the first match?

m-mohr commented 1 year ago

True, that's not ideal. Don't have a good idea yet how to resolve it though...

m-mohr commented 1 year ago

The only thing I could think of right now is to require e.g. a "./" at the beginning of file paths. This doesn't feel ideal though. Alternatively, back-ends would need to be given a priority list, I guess... Any opinions or ideas?

soxofaan commented 1 year ago

another option is requiring some kind of scheme prefix when there is risk of confusion, e.g. workspace://mymodel.foo, which extends easily to other storage solutions or "namespaces" (s3://..., https://..., ... or even job:... and job://otherbackend/...)

m-mohr commented 1 year ago

Yeah, I thought about that too, but I found that even more confusing/unexpected. 🤔

m-mohr commented 1 year ago

Maybe define two processes? Alternatively, I'd say we say that it tries the batch job ID first and otherwise loads the file (because you can change file names but not batch job IDs.)

m-mohr commented 1 year ago

I thought a bit more about this and I think we can simply remove the batch job id subtype. A batch job ID can also just be provided via the uri subtype. Either you provide the canonical (i.e. "public") link and then it's just like any external data. Or you can simply provide a URI such as https://example.com/api/v1.0/jobs/12345/results and then you can easily detect the job id from it. The clients can help with it by allowing Job objects as input for example.

I created PR #413 for it.

m-mohr commented 11 months ago

Solved by #413, I think.