Open-EO / openeo-api

The openEO API specification
http://api.openeo.org
Apache License 2.0
91 stars 11 forks source link

Consider using a standardized API for file access #135

Open jdries opened 5 years ago

jdries commented 5 years ago

We have currently defined our own API for sharing files with OpenEO. The S3 API is also a well known http-based file api (object storage). I'm not an expert, so this is really more like a question to investigate if this would be usable. If S3 covers all of our requirements, using it would simplify our own API, and also backend impementations as it is very widely adopted and supported by existing software.

m-mohr commented 5 years ago

I'd really like to adopt a well-known/well-defined API for file management. I'm also not an expert in S3 or other potential file-related HTTP APIs. Anybody here has experience? A first look at the S3 REST API makes me feel that it is a bit too complex for our "simple" implementations. I'm not sure yet whether that API could be stripped down to just allow a minimal subset, which I'd say is mandatory to keep the openEO API simple in this regard. If it needs to be fully implemented I could only see that to be added as an extension. Are there other file APIs we could adopt? I found Azure and Google of course.

edzer commented 5 years ago

Ideally this would be back-end heterogeneity that we would abstract away in the openEO API.

m-mohr commented 5 years ago

@edzer That is what we are trying at the moment with our file API, but it is a bit limited and proprietary. If there is a standard that could be adopted with a good ecosystem, it would be a good idea to adopt it. Not sure whether the existing cloud services as S3, Azure, GCS could handle that as they usually have service specific things in their APIs. So we are basically looking for an existing standard that already did the abstraction. If there is none, we probably continue with what we have at the moment.

edzer commented 5 years ago

GDAL supports the /vsi prefixes: /vsizip/, /vsis3/, /vsigcs/ etc see here that abstracts over many cases operationally, i.e. it is a working implementation. It does mean that a script needs to be adapted when porting from AWS to GCS.

edzer commented 5 years ago

But that might be OK (and could even be automated).

m-mohr commented 5 years ago

@edzer As discussed, that could be useful for back-end implementations, but I don't see a direct benefit for the API specification. I'm more looking for something like a simple and "modern" WebDAV.

m-mohr commented 5 years ago

Maybe remoteStorage is what we are looking for: https://remotestorage.io/ Some thoughts: https://unterwaditzer.net/2015/kill-webdav.html

The only thing that is more complex in remoteStorage than in WebDAV is authentication. RemoteStorage requires the server to support a subset of OAuth, and that's the only kind of authentication supported. It also requires WebFinger support instead of making it optional (like in WebDAV, where it's almost a luxury if the DAV client actually finds the HTTP endpoints it's supposed to use).

Sound great, but I'm wondering how we can integrate that given the fact that we need to merge the openEO and remoteStorage authentication procedures somehow.

Another interesting repo to look at is https://github.com/scality/cloudserver

mkadunc commented 5 years ago

Maybe remoteStorage is what we are looking for: https://remotestorage.io/

I also like remoteStorage a lot, but it has a long way to go before it replaces S3 API as the go-to REST interface.

The industry seems to have settled on S3's interface for object storage - in addition to scality/cloudserver , many other solutions use the same API or provide S3-compatible proxy to GCS and others, e.g. Min.io , Ceph, OpenStack Swift.

m-mohr commented 5 years ago

Conclusion from 3rd year planning:

If S3 is not manageable for back-ends to implement, we'll fall back to what we have at the moment.

mkadunc commented 5 years ago

For Sinergise, S3 (or a subset thereof) would be the preferred interface for file access and management.

Swagger 2.0 spec. generated using https://github.com/APIs-guru/aws2openapi (looks quite current): https://github.com/APIs-guru/openapi-directory/blob/master/APIs/amazonaws.com/s3/2006-03-01/swagger.yaml

m-mohr commented 4 years ago

Thanks @mkadunc , appreciate the links!

The swagger file looks quite complicated (the file is 8000 lines, openEO API is not even half as long). Also, the generated version seems to have some issues regarding compatibility with OpenAPI. S3 has many endpoints and to me it's not quite sure what they are all about, especially as they use fragments (e.g. /{Bucket}#publicAccessBlock), this doesn't look very "RESTish". Which of them do we actually need? GET/PUT/DELETE for /{Bucket} and /{Bucket}/{Key}? In the end I (we?) would need advice on how to integrate S3 in a way that it's compatible to their ecosystem.

mkadunc commented 4 years ago

I suggest we focus mostly on the Object operations, and leave management of buckets up to the backend (it seems that's how we started anyway) - from the Bucket operations we'll probably only need GET (list object).

I suggest we keep the openEO-mandated subset of supported API calls as small as possible, i.e. only the minimum required for basic functioning of openEO web editor.

m-mohr commented 4 years ago

Makes sense. Still need to figure out what is the minimum set of endpoints you need to implement.

What I don't like at all about S3 that it mandates using a different authentication procedure (HMAC?) as we currently use, which is the same reason for which we rejected remoteStorage.io. Also, the endpoints use XML, which we tried to avoid mixing with JSON at all costs. So I have more concerns implementing it after having a (quick) look at it.

m-mohr commented 4 years ago

No updates yet according to the dev telco today.

m-mohr commented 4 years ago

@jdries Any news on this? I'll move to "future" until there are new insights posted here.