dotmesh-io / dotmesh

dotmesh (dm) is like git for your data volumes (databases, files etc) in Docker and Kubernetes
https://dotmesh.com
Apache License 2.0
538 stars 29 forks source link

#793: pluggable user management for Dotscience #794

Closed alaric-dotmesh closed 4 years ago

alaric-dotmesh commented 4 years ago

When we implement RBAC in the gateway, decided who can access what dot on the hub is going to involve a lot of dotscience-specific knowledge about organisations and roles. Rather than trying to teach Dotscience that, we want to make Dotscience delegate it all back to the gateway. That's what this issue is for. As a side effect, we'll be able to stop this syncing of users between dotmesh and the gateway once and for all.

I've moved access control into the existing UserManager interface for simplicity, as asking what users can do what is pertinent to user management, and it puts everything we want to delegate into one place. An environment variable (and corresponding dm init flag so we can easily enable it in tests) switches Dotmesh into using an external user manager rather than the normal internal one.

The HTTP API between dotmesh and gw is a direct translation of the underlying dotmesh UserManager API, warts and all, as I didn't want to spend the time refactoring all of that now. That means it doesn't really fit REST conventions, but it'll do. Although the dotmesh repo is public, we're not making this a "feature" of dotmesh for public consumption and not expecting third parties to implement the API themselves. The client and server for the protocol are both in the dotmesh repo so can evolve in synch together if we feel inclined to neaten it up in future.

I inspected the dotmesh code to write tests to trigger all the user manager API calls.

rusenask commented 4 years ago

the main issue of no router and no handlers that could be created separately and then tested with HTTP test server still persists

alaric-dotmesh commented 4 years ago

the main issue of no router and no handlers that could be created separately and then tested with HTTP test server still persists

Is that a show-stopper? :-(

  1. What do we really need a router for? I mean, my switch statement is a router, in that it routes the requests to the right handlers.
  2. It's easy to spin up a real server on a random port to test it, an approach I prefer over things like NewTestServer() in the gateway - which creates an artificial environment that differs from reality in subtle ways, pushing more testing burden onto the e2e tests (are there any tests in the gateway repo that actually test normal server startup code paths? I've not seen any!)
rusenask commented 4 years ago

the main issue of no router and no handlers that could be created separately and then tested with HTTP test server still persists

Is that a show-stopper? :-(

1. What do we really need a router for? I mean, my switch statement _is_ a router, in that it routes the requests to the right handlers.

2. It's easy to spin up a real server on a random port to test it, an approach I prefer over things like `NewTestServer()` in the gateway - which creates an artificial environment that differs from reality in subtle ways, pushing more testing burden onto the e2e tests (are there any tests in the gateway repo that actually test normal server startup code paths? I've not seen any!)

While it would be probably simple to rewrite it later, there's little point in doing something super unusual like this. It should be split into:

  1. Create a router
  2. Register handlers (so handlers can then be tested)
  3. Start server taking a router (and we can skip this if we want during tests)

It's good to avoid using fmt.Sprintf() in any hot path and this server would be called all the time. Sprintf is pretty much a template where Go has to do allocation, create a template and then we do matching against it. But even this is not really important when you look at a bigger picture where we ended up with this mega function that can't easily tested. There are plenty of examples like this where we get from day 1 some hard to test code and then spend 10x more time trying to cover them with tests :)

I can help tomorrow with this function to split it a bit and make it more manageable.

alaric-dotmesh commented 4 years ago

It's good to avoid using fmt.Sprintf() in any hot path and this server would be called all the time. Sprintf is pretty much a template where Go has to do allocation, create a template and then we do matching against it.

I think that's a drop in the ocean compared to the general overheads involved in an HTTP request, and it's quite possible that the overheads of fmt.Sprintf are less than the overheads of mux's internals (looking at the router.Match method in the source, and the number of nested loops it uses, and what looks like a regexp matcher)...

But even this is not really important when you look at a bigger picture where we ended up with this mega function that can't easily tested.

It's already got tests! Which work by sending in actual HTTP requests so cover the entire stack faithfully!

But, if it's what it takes to get this PR accepted, I'll do it, even though in my opinion it's unnecessary work that will make this module longer and harder to read.

alaric-dotmesh commented 4 years ago

But, if it's what it takes to get this PR accepted, I'll do it, even though in my opinion it's unnecessary work that will make this module longer and harder to read.

Done.