cs3org / reva

WebDAV/gRPC/HTTP high performance server to link high level clients to storage backends
https://reva.link
Apache License 2.0
165 stars 113 forks source link

Experience of starting out with Reva as a total noob! #2169

Open marcolarosa opened 2 years ago

marcolarosa commented 2 years ago

Hi, I hope this is ok - I thought I'd note down my experience of getting started with Reva. I may be way off with this so please be gentle!

Getting started

The documentation at reva.link seemed to assume some knowledge on the part of the user. I'm not sure how to explain this but it comes from being overwhelmed by all the pieces. For my use case I needed to start up a reva instance so I could develop against it. But that wasn't trivial to do.

In this case it would have been nice to start from a repo with a functional docker setup. I have that now: https://github.com/Arkisto-Platform/describo-reva.

Reva docker containers

I couldn't find any reva docker containers so I put in a docker file into my repo to build a container with both reva and revad. Is there a reva container on docker hub somewhere?

Reva services

Reading the docs I followed https://reva.link/docs/tutorials/setup-tutorial/ which led me to https://github.com/cs3org/reva/tree/master/examples/storage-references. When I looked through those files I saw that multiple services were being started on different ports so the first version of my repo set them up as separate containers. But doing that I couldn't auth to reva on port 19000 and then talk to the home service in another container on 17000

Does everything go through the gateway ?

Maybe this is obvious but I assumed that one would auth to the gateway on 19000 and then get a list of backend endpoints to talk to directly. When I set up multiple containers this mode of operation didn't work. So I combined everything back into a single container and then just interacted with the gateway (from rclone in case it matters) and it worked as expected. That threw me - I assumed the point of separate service endpoints was to scale the load - is that assumption incorrect?

If it can be setup as multiple containers I couldn't figure how to configure it so that all the services could talk to each other.

Authenticating to reva

In my use there is a requirement to authenticate against reva. Whilst I could do a POST with basic auth to http://...:19000/ the service would return a 404. Interestingly, the response had X-Access-Token set as a header. Is there an authentication route that I should be using?

X-Access-Token and rclone

My application uses rclone on the api to talk to various backend storage systems: onedrive, s3, owncloud and now reva. I tried using the X-Access-Token with rclone but it didn't work with reva. Does reva only support basic auth at the moment? Rclone requires the user password to be obscured in the config but it's easily reversible. And I don't like having to store a users' password anyway so being able to auth, get a token and then use that would be ideal.

CORS

In my SPA, I would like to be able to auth to reva directly but to do that reva needs to set CORS headers. I couldn't figure out how to set the CORS headers on the gateway by configuration (it seems to be supported but not yet documented) so I worked around it by putting an nginx proxy in front that added the cors headers for me. Obviously, that's not ideal. But it got me thinking: is reva supposed to be behind a real http server?

I reached out for help in https://github.com/cs3org/reva/issues/2156 - closed and replaced by this ticket.

Getting a preview link to a file

I don't know if this is possible but it would be great to be able to get a short lived (1 hour) link to a file for preview purposes. Something like S3's presigned url. That way, a calling service like mine can just embed the url directly instead of having to stream the content via the API.

Documentation

The documentation needs work. Again I come back to the assumed knowledge comment at the start. Perhaps the tutorials could be structured in a simple to complicated format: use repo X to setup a simple, one container reva environment or go to repo Y to setup a multi container environment with services configured to talk to each other over the network....

And of course, fleshing out the documentation would be great! It seems reva can do a lot but the docs only scratch the surface.

HTTP routes

I found - totally by accident - that reva has http routes by just spelunking around the tickets (https://github.com/cs3org/reva/issues/1923#issuecomment-891908827. But I couldn't find any central registry of routes in the source code.

uselessbusinessclown commented 2 years ago

Hi Marco, have you taken a look at OCIS? Should be far easier to consume than "raw reva"? See https://github.com/owncloud/ocis

marcolarosa commented 2 years ago

@uselessbusinessclown thankyou for the info! I hope what comes next doesn't offend you as it's directed at the reva developers.

To the reva developers.

I'm going to close this ticket as 12 days have passed and no-one has reached out to help. I thought providing feedback as a new user would be useful (developer to developer) but it seems I was wrong. So rather than have it closed because it's stale, I'll close it myself.

Whilst I'm aware that there is much politicking happening around this project I'm not in the least bit interested. I am an application developer who is trying to deploy a metadata application onto the Science Mesh. In the current iteration it can talk to reva via webdav and that meets my requirements.

That said, I think reva is potentially a brilliant solution to my storage woes (ie talking to many different backends) and working with the API's would certainly be a better solution. However, if you don't help, and until the documentation improves, reva is useless to me.

Apologies for being so blunt but reva should be to storage as tcp is to http. It's a transport layer concern that should work well and not ever come up in conversation. The application layer is the bit to focus on but in my limited experience so far, it's about the transport layer and not the applications.

If i'm out of line please re-open ticket and address the concerns I raised. In return, I will apologise for being too quick to judge.

uselessbusinessclown commented 2 years ago

Well, you said no one has reached out to help, yet fully ignored my "possibly very helpful" comment. shrug

marcolarosa commented 2 years ago

hi @uselessbusinessclown Fair comment :-). Your comment actually was helpful. I went looking at the OCIS docs you pointed me at but at the moment I'm stuck in a bit of tough spot not of my own making and at this stage, OCIS is not the solution. Out of my hands for the moment....

But thankyou FWIW!

uselessbusinessclown commented 2 years ago

Well, look at the OCIS Architecture Graph, thats where Reva is used and was build for. Especially in the context of CS3MESH4EOSC.

marcolarosa commented 2 years ago

Well, look at the OCIS Architecture Graph, thats where Reva is used and was build for. Especially in the context of CS3MESH4EOSC.

I will do. Thankyou again

labkode commented 2 years ago

Hi @marcolarosa, I re-open this issue as your comments were not addressed. Apologies for getting into this issue right now but we get many issues to go through! The best is to ping us on Gitter: https://gitter.im/cs3org/REVA.

I'll take al look at your comments and give you clarity on them, hopefully today.

marcolarosa commented 2 years ago

Hi @labkode. Thankyou. I very much appreciate you reaching out to me!

Let me add some other things I've found by trying to use the cs3 api's. These experiences are probably more related to those repo's but since this is a general discussion it might save you addressing some of the initial questions I raised.

I went to use the cs3apis from nodejs. From https://github.com/cs3org/cs3apis I found that the JS implementation is at https://github.com/cs3org/js-cs3apis. When I installed that and tried to import GatewayAPIClient it error'ed. I've written up a bug report for this at https://github.com/cs3org/js-cs3apis/issues/5. When I put in the hack I detail in there I got an error saying that XMLHttpRequest is not available. So that made me think this is the wrong library for use in nodejs environments. If this is correct I couldn't see it documented anywhere.

After some more digging I found https://github.com/cs3org/node-cs3apis (this is not referenced from the main repo) and I was able to import the client and auth to my reva instance. But the version of this lib is 0.0.26 whereas the js-c3apis are 0.0.35. There is a ticket (from someone else) asking how these two repo's are related but it hasn't been addressed and I couldn't figure out the difference. It almost seems like the newer repo is intended as a replacement for the older one? But only for use in browser environments (which would make my integration a bit easier actually)?

After playing around with the nodejs apis a bit I started to get a feel for how the api documentation relates to the actual code. For example, when you auth and get back a response, you can call the methods getUser(), getUser().getMail(), getUser().getUidNumber() on the response which seem to map to the properties at https://cs3org.github.io/cs3apis/#cs3.identity.user.v1beta1.User. Notice the naming? camelcase it and put get in front. Have I got that right?

I looked for tests that might illuminate usage and I couldn't find any. Is this because the js / nodejs lib's are generated from the go libraries?

I couldn't find any repo's showing a basic integration that demonstrates auth and file listing or something equally simple. I'm happy to help create one for nodejs if you like.

labkode commented 2 years ago

Hi, I hope this is ok - I thought I'd note down my experience of getting started with Reva. I may be way off with this so please be gentle!

Getting started

The documentation at reva.link seemed to assume some knowledge on the part of the user. I'm not sure how to explain this but it comes from being overwhelmed by all the pieces. For my use case I needed to start up a reva instance so I could develop against it. But that wasn't trivial to do.

I see that this part wasn't clear for you, I'm happy to hear more about how we could add either more information or re-structure the guides, any suggestion is welcome but we need to concretise.

In this case it would have been nice to start from a repo with a functional docker setup. I have that now: https://github.com/Arkisto-Platform/describo-reva.

Reva docker containers

I couldn't find any reva docker containers so I put in a docker file into my repo to build a container with both reva and revad. Is there a reva container on docker hub somewhere?

There are available in dockerhub: https://hub.docker.com/r/cs3org/revad

I think we're missing a link in our documentation in the "Getting started". There are also Kubernetes charts around, in case you're interested. @SamuAlfageme

Reva services

Reading the docs I followed https://reva.link/docs/tutorials/setup-tutorial/ which led me to https://github.com/cs3org/reva/tree/master/examples/storage-references. When I looked through those files I saw that multiple services were being started on different ports so the first version of my repo set them up as separate containers. But doing that I couldn't auth to reva on port 19000 and then talk to the home service in another container on 17000

You don't need a docker setup for that, you can just revad -dev-dir /etc/myconfigs and reva will star the different services. You can have all those configs inside just one container.

Does everything go through the gateway ?

Maybe this is obvious but I assumed that one would auth to the gateway on 19000 and then get a list of backend endpoints to talk to directly. When I set up multiple containers this mode of operation didn't work. So I combined everything back into a single container and then just interacted with the gateway (from rclone in case it matters) and it worked as expected. That threw me - I assumed the point of separate service endpoints was to scale the load - is that assumption incorrect?

The "gateway" service is a proxy for metadata requests and the "datagateway" is a proxy for data requests. All the actions happen through them, including authentication. Regarding data transfers, one can also expose an internal url to the client, however this assumes that the client knows how to authenticate to the storage and knows the protocol.

If it can be setup as multiple containers I couldn't figure how to configure it so that all the services could talk to each other.

If you're using docker-compose, can't you use service names rather than hostnames?

Authenticating to reva

In my use there is a requirement to authenticate against reva. Whilst I could do a POST with basic auth to http://...:19000/ the service would return a 404. Interestingly, the response had X-Access-Token set as a header. Is there an authentication route that I should be using?

Do you have a reason to not authenticate using the gRPC CS3APIS from your client? gRPC is the most straightforward way to achieve any functionality in Reva. The HTTP APIs are just wrappers around the gRPC services.

Basic auth is also possible, probably the configuration you're using is not enabling it.

X-Access-Token and rclone

My application uses rclone on the api to talk to various backend storage systems: onedrive, s3, owncloud and now reva. I tried using the X-Access-Token with rclone but it didn't work with reva. Does reva only support basic auth at the moment? Rclone requires the user password to be obscured in the config but it's easily reversible. And I don't like having to store a users' password anyway so being able to auth, get a token and then use that would be ideal.

You can definitely pass a token (in the X-Access-Token) header after you have obtained from the user, meaning basic auth/ OIDC, etc. There is also a mechanism (the machine auth strategy) that uses a shared secret between applications to impersonate users.

CORS

In my SPA, I would like to be able to auth to reva directly but to do that reva needs to set CORS headers. I couldn't figure out how to set the CORS headers on the gateway by configuration (it seems to be supported but not yet documented) so I worked around it by putting an nginx proxy in front that added the cors headers for me. Obviously, that's not ideal. But it got me thinking: is reva supposed to be behind a real http server?

Reva supports CORS as well but the docs are not there: https://reva.link/docs/config/http/middlewares/cors/

And yes, in most production deployments you'll have another proxy in front, like NGINX that can proxy both gRPC and HTTP traffic.

I reached out for help in #2156 - closed and replaced by this ticket.

Getting a preview link to a file

I don't know if this is possible but it would be great to be able to get a short lived (1 hour) link to a file for preview purposes. Something like S3's presigned url. That way, a calling service like mine can just embed the url directly instead of having to stream the content via the API.

@marcolarosa OCIS adds this functionality on top of Reva for public links, however I don't know if that is programmable so you can use it as an API.

Documentation

The documentation needs work. Again I come back to the assumed knowledge comment at the start. Perhaps the tutorials could be structured in a simple to complicated format: use repo X to setup a simple, one container reva environment or go to repo Y to setup a multi container environment with services configured to talk to each other over the network....

And of course, fleshing out the documentation would be great! It seems reva can do a lot but the docs only scratch the surface.

This is definitely an area to improve.

HTTP routes

I found - totally by accident - that reva has http routes by just spelunking around the tickets (#1923 (comment). But I couldn't find any central registry of routes in the source code.

There isn't a single file where all the routes are defined. Each service declares its own routes, for example: https://github.com/cs3org/reva/blob/master/internal/http/services/owncloud/ocs/ocs.go#L102

Having a single place with all the routes is definetely more developer friendly, I wonder if we could have the same extensibility with such central place.

marcolarosa commented 2 years ago

@labkode Thankyou for the information. I've put the responses inline.

I see that this part wasn't clear for you, I'm happy to hear more about how we could add either more information or re-structure the guides, any suggestion is welcome but we need to concretise.

Ok. Here's some thoughts re: documentation on reva.link

  1. Write the empty documentation. There's lots of it and you yourself make reference to the empty cors documentation further on.
  2. Provide a developers documentation section. There is no reference that I can find on that site to the cs3 api's or the libraries to interact with them. The API documentation would make a good addition to that site rather than being something you get to by following a github trail.
  3. On the topic of developer documentation. Provide a more detailed example like the one in the node-cs3apis readme (https://github.com/cs3org/node-cs3apis) that shows authentication, file listing, and maybe a couple of simple operations - download a file, create a file etc. API documentation is rarely meaningful without some example code to help guide understanding of the intentions of the library developers.
  4. Getting started - Beginner's guide should point to a functioning reva setup a new user can just run via docker without needing to know anything about configuration. Something like my describo-reva setup I pointed at. Then, the documentation can walk the user through that configuration explaining the components and the why. From there you can lead into more scaleable (exa scale) type configurations and gotchas to consider (ie in prod it runs behind an edge server like nginx which proxies http and grpc) when running in production at scale.
  5. Spend far more energy explaining the why (with graphics) of reva on the concepts page. That page is severely underbaked.

Reva services

Reading the docs I followed https://reva.link/docs/tutorials/setup-tutorial/ which led me to https://github.com/cs3org/reva/tree/master/examples/storage-references. When I looked through those files I saw that multiple services were being started on different ports so the first version of my repo set them up as separate containers. But doing that I couldn't auth to reva on port 19000 and then talk to the home service in another container on 17000

You don't need a docker setup for that, you can just revad -dev-dir /etc/myconfigs and reva will star the different services. You can have all those configs inside just one container.

Well, as a developer I don't want to polute my machine with every little thing I need to test. :-) So, a docker setup is absolutely essential for me. But in terms of how i set it up - I did end up doing it the way you suggest. My point however is that running multiple services inside a docker container is not the docker way. Hence why I initially went with the multi container approach after reading the gateway configuration file. It doesn't matter in my case but this leads back to documentation: for someone wanting to run reva in production using docker, they would likely want to have one service per container. How is that done? Is there a repo with a working docker environment to start from? Why not? Seems it would take the experts a few hours to set one up and then it's done and people like me can't complain. :-)

Does everything go through the gateway ?

Maybe this is obvious but I assumed that one would auth to the gateway on 19000 and then get a list of backend endpoints to talk to directly. When I set up multiple containers this mode of operation didn't work. So I combined everything back into a single container and then just interacted with the gateway (from rclone in case it matters) and it worked as expected. That threw me - I assumed the point of separate service endpoints was to scale the load - is that assumption incorrect?

The "gateway" service is a proxy for metadata requests and the "datagateway" is a proxy for data requests. All the actions happen through them, including authentication. Regarding data transfers, one can also expose an internal url to the client, however this assumes that the client knows how to authenticate to the storage and knows the protocol.

Again - concepts needs more work because there's a massive disconnect between the concepts as written and the configuration examples in the repo.

If it can be setup as multiple containers I couldn't figure how to configure it so that all the services could talk to each other.

If you're using docker-compose, can't you use service names rather than hostnames?

I tried. :-) I couldn't get multi container to work. But let's not focus on this.

Authenticating to reva

In my use there is a requirement to authenticate against reva. Whilst I could do a POST with basic auth to http://...:19000/ the service would return a 404. Interestingly, the response had X-Access-Token set as a header. Is there an authentication route that I should be using?

Do you have a reason to not authenticate using the gRPC CS3APIS from your client? gRPC is the most straightforward way to achieve any functionality in Reva. The HTTP APIs are just wrappers around the gRPC services.

Basic auth is also possible, probably the configuration you're using is not enabling it.

This is an excellent question! Two weeks ago I didn't know there were API's! I read the reva.link site, saw what was there and thought "ok - let's figure this out". I added in another comment to this thread last night after you wrote to me. It goes through my experience of using the JS libraries to talk grpc to the cs3apis. Here's the link: https://github.com/cs3org/reva/issues/2169#issuecomment-951787019

I'll wait for your feedback on that.

X-Access-Token and rclone

My application uses rclone on the api to talk to various backend storage systems: onedrive, s3, owncloud and now reva. I tried using the X-Access-Token with rclone but it didn't work with reva. Does reva only support basic auth at the moment? Rclone requires the user password to be obscured in the config but it's easily reversible. And I don't like having to store a users' password anyway so being able to auth, get a token and then use that would be ideal.

You can definitely pass a token (in the X-Access-Token) header after you have obtained from the user, meaning basic auth/ OIDC, etc. There is also a mechanism (the machine auth strategy) that uses a shared secret between applications to impersonate users.

I'm not sure we're talking about the same thing here. But perhaps i'm not understanding your response. I couldn't get rclone to talk to the webdav endpoint using the token I got from the browser when doing a POST to reva at '/'. Again, let's not focus on this. Getting the documentation sorted is far more important than solving a use case that may cease to be a use case going forward.

Reva supports CORS as well but the docs are not there: https://reva.link/docs/config/http/middlewares/cors/

And yes, in most production deployments you'll have another proxy in front, like NGINX that can proxy both gRPC and HTTP traffic.

Documentation.

Getting a preview link to a file

I don't know if this is possible but it would be great to be able to get a short lived (1 hour) link to a file for preview purposes. Something like S3's presigned url. That way, a calling service like mine can just embed the url directly instead of having to stream the content via the API.

@marcolarosa OCIS adds this functionality on top of Reva for public links, however I don't know if that is programmable so you can use it as an API.

So here's a question. In my app it would be good to show the user a preview of their file when they're a writing the metadata for it. If reva cannot provide this functionality and OCIS can - then why wouldn't I just talk to OCIS for everything? Having to talk to OCIS and reva for bits of the capability I need would unnecessarily complicate my code. Will reva be providing this capability? When or why not?

Documentation

The documentation needs work. Again I come back to the assumed knowledge comment at the start. Perhaps the tutorials could be structured in a simple to complicated format: use repo X to setup a simple, one container reva environment or go to repo Y to setup a multi container environment with services configured to talk to each other over the network.... And of course, fleshing out the documentation would be great! It seems reva can do a lot but the docs only scratch the surface.

This is definitely an area to improve.

Agreed!

Having a single place with all the routes is definetely more developer friendly, I wonder if we could have the same extensibility with such central place.

Less important if the reva libraries work. And at this stage i've had mixed results.

marcolarosa commented 2 years ago

Just to follow up. I had a very useful and productive meeting with @labkode and learnt the following:

In addition, I've implemented the capabilities I need and have also produced a repo with some documentation and sample code for others to get started with. There was still quite a bit of assumed knowledge for a non grpc expert like me but once you see a few examples, it's fairly easy to figure out the rest as the pattern is consistent.

The repo is at https://github.com/Arkisto-Platform/reva-tutorial