Remote GeoServer resource manager implementation

nmco commented 4 years ago

The available budget for this task is 8 days, the project and code for the time-sheet:

LAMMA
C018-LAMMA-2019-SERAPIDE

The idea here is to allow an external system, in this case SERAPIDE, to perform GeoServer resources access management. The usual way of handling this situations its too implement a custom GeoServer resource manager which will request the external system has need and build the custom rules.

Since this is a situation that we already faced multiple times, we should strive to make something as generic as possible that can be contributed to GeoServer as a community module and reused (as is or as a programming support library), the initial brainstorm its available here.

The following generic functional aspects were identified:

Configure in GeoServer the access to the remote authorization rules management system through an HTTP based API:
- Here we need also to vouch for authentication, although often is not required.
Query the remote system for the necessary authorization rules:
- We need to match the resource the system its trying to access with the resources the external system knows about.
- Parse the authorization rules returned by the external system and convert them to data access limits, either for reading or writing.
- Correctly log the interaction whit the remote authorization rules management system.
Perform the resource access management, this includes:
- Managing exceptions, for example, what should be done when the user or resource was not identified by the external system or what shoudl be done if the external system replied whit an exception.
- What can we do for situations where a significant number of resources will need to accessed, for example, for a capabilities document?
A caching layer between GeoServer and the remote system, this includes properly caching the obtain data access limits as well providing an UI to configure and manage the cache, and a REST end point to clean the cache.

Two groups of external authorization rules management systems were identified:

The ones that are aware of GeoServer layers, layer groups, etc ... and have authorization rules explicitly defined for them, this is the case for SERAPIDE.
The ones that don't know about GeoServer layers, authorization rules are associated with more generic resources like for example building or vessels positions (this its the case for EMSA).

For this first version of the system we shoudl focus on the first group (SERAPIDE), but we shoudl bear in mind the second group since later the community module will be extended to support it. The initial brainstorm between me and @giohappy its about the first group.

nmco commented 4 years ago

@aaime your opinion on this topic is very welcome :smile:. @taba90 can you start brainstorming about a technical plan and respective estimates? For the UI parts we will needs mock ups :framed_picture:.

In practical terms @taba90 the steps are:

Initial brainstorm and questions.
Prepare a technical plan (described here on this issue) with estimates and mock-ups for the UI.
Ask for @aaime and @giohappy reviews and follow up as needed.
Wait for @giohappy green light before moving on whit the implementation.

taba90 commented 4 years ago

Configuration: a Configuration class with base url, an optional authentication key, a request timeout, and cache parameters (expire after and size). Then wicket classes to map the configuration to UI, with two sections one for the the rest service and one for the cache. 2 days
A class that will query the cache or the remote system for the data access rule (throwing a RunTimeException if it is not able to retrieve the rule?). The class will take care also to manage the cache initiliazation according to the parameters passed by the configuration class. In the case we have a significative number of resource to be accessed, we will query the external system with the global path api_url/layers/?user={username}. A rest controller class with the endpoint for clearing the cache. Finally we would need a class providing methods for converting rules returned by the remote system to geoserver access limit. 3 days
Implementing the ResourceAccessManager methods, mapped on the urls identified in the brainstorm doc (2 days). If I got the point we would have different implementation according to the fact that the external service knows or not layer/layer group. So I guess an option in the configuration UI would be needed to mark the situation where the external system knwos or not layers and if not providing showing a section or a tab to configure layer to resource mapping?.

Didn't get the point on catalog AccessMode. If are not retrieving that information from the external service are we making it configurable from UI?

giohappy commented 4 years ago

@taba90 you got the point on everything. Just some notes:

the AccessMode was an error by me to propose its inclusion inside the service response. I suppose it should be managed Geoserver side, at the RAM level (as it is for the Data security configuration)
I think the UI will provide the option to opt (through checkbox?) either for directly requesting rules for the layer, layergroup or based on a mapping. How we will configure the mapping is too early to say, it might be a table where a layer/layergroup can be selected and assigned to a certain named resource (a free text string field), or a textarea where the mapping can be declared with some custom syntax. Anyway, we shouldn't bother now I think. This must be taken into account just because the REM should implement a "resource mapping", that in this first implementation will be one to one with layers.
@nmco and @taba90 one thing I would include is the option to namespace the requests by worksapce (as for my recent comment on the brainstorming doc).

giohappy commented 4 years ago

@taba90 @nmco a clarification. In case the "layers/" authorization endpoint returns only a subset of available layers, we should assume a DENY for all the layers outside that list. Do you agree?

This could make the namespacing by worksapce optional for the moment

taba90 commented 4 years ago

One thing that I've not clear: since that the ResourceAccessManager is invoked to check layer by layer, then request for multiple layer eg. the /layers, or /workspace/{workspace}/layers would have the aim to be executed as scheduled batch processes to populate the cache. If I'm right and given this, then /workspace/{workspace}/layers would loise utility since we would already have the /layers request to populate the cache... I'm also attaching a mock of the UI:

Mock

giohappy commented 4 years ago

@taba90 actually the usage of the endpoints had to be defined. My proposal was generic so to make REM able to take advantage of the various options.

In my initial idea I supposed Geoserver was able to request rules in batch to a RAM in case, for example, of a GetCapabilities request. But from what you say I guess layers are always checked one by one. If this is not the case I agree with you that the workspace/ endpoint doesn't make sense.

By the way @taba90 can you clarify what you mean by "scheduled batch processes"? How do imagine these scheduled processes to work, how are they configured, etc.?

taba90 commented 4 years ago

Sorry @giohappy I've not been clear in the above comment I'm rephrasing. Since looks like layer check is made one by one and since in the braninstorm doc there are those endpoints aimed at requesting multiple layers at once in a batch process, I was trying to understand if that was referring to the possibility to have a batch process scheduled to start, lets say every night, to request layers' data access limit filling the cache with eventually new added.

giohappy commented 4 years ago

No @taba90 , nothing like that was envisioned. The only purpose of it, in my mind, was to leverage a batch request in case Geoserver was able to do it. The example of a GetCapabilities was done on purpose. I thought that in this case Geoserver was able to ask the access rules in a single shot to the RAM. In that case having a "batch" endpoint could have been effective. If however will always ask for rule layer-by-layer (even in case of requests that might potentially hit multiple layers) these endpoints are unuseful.

Cache should be there in any case, of course, to keep the single layer access rules in memory for the configurable amount of time.

giohappy commented 4 years ago

@taba90 @nmco news?

simboss commented 4 years ago

@taba90 what is your estimate for this one? Can you update the estimate control for this issue?

simboss commented 4 years ago

Meanwhile I am turning this into an epic so that before we start working we open issues for the work.

taba90 commented 4 years ago

Configuration: a Configuration class with base url, an optional authentication key, a request timeout, and cache parameters (expire after and size). Then wicket classes to map the configuration to UI, with two sections one for the the rest service and one for the cache. 2 days
A class that will query the cache or the remote system for the data access rule (throwing a RunTimeException if it is not able to retrieve the rule?). The class will take care also to manage the cache initiliazation according to the parameters passed by the configuration class. A rest controller class with the endpoint for clearing the cache. Finally we would need a class providing methods for converting rules returned by the remote system to geoserver access limit. 3 days
Implementing the ResourceAccessManager methods, mapped on the urls identified in the brainstorm doc (2 days).

nmco commented 4 years ago

Thx for the update @taba90, this one its on hold for the moment, the client its re-organizing his priorities.

nmco commented 3 years ago

So @simboss and @giohappy is this still relevant?

giohappy commented 3 years ago

the project where this request stems from has been closed without this feature. Of course this would be a useful addition for Geoserver, even inside the context of GeoNode, but it's not a top priority right now. @simboss what's your opinion? should close this for the time bieng?

geosolutions-it / geoserver

Remote GeoServer resource manager implementation #190