elixir-cloud-aai / drs-filer

Lightweight, flexible Flask/Gunicorn-based GA4GH DRS implementation
Apache License 2.0
5 stars 8 forks source link

feat: plugins/middleware/hooks #65

Open jvkersch opened 3 months ago

jvkersch commented 3 months ago

Following the example of proTES, it would be useful to have a plugin/middleware/hooks system to provide additional functionality to DRS-filer, or to modify existing behaviour. Such extra functionality could e.g. include support for crypt4gh (as implemented in a plugin-less way in pa-DRS-Crypt4GH-PoC).

This is a first, rough design document to describe such a plugin system. Caveat: I only know of one realistic example of a plugin so far (offering support for Crypt4GH). If we can find a few more then we can check whether the proposed design is suitably general to support all usecases.

Considerations/context

Tentative design

Given that the plugin should be able to interfere with the behaviour of each endpoint at two moments in time (after the request has been parsed, and before the response is serialized), this suggests having two dedicated methods per endpoint (, as in the design below:

class DummyMiddleware:

  def pre_GetObject(self, object_id):
     # Code that is run before GetObject is called goes here
  def post_GetObject(self, object_id, object):
     # Code that is run after GetObject returns goes here

  def pre_getServiceInfo(self):
     # ...
  def post_getServiceInfo(self, service_info):
     # ...

  # Other endpoints go here

Note how the pre/post methods follow the signature of the endpoint that they wrap. Plugins do not have to implement all methods, just the ones for which they have functionality to contribute.

Advantages/disadvantages

Example plugin (Crypt4GH)

This plugin has to offer two pieces of functionality.

  1. It has to advertise that the server has support for Crypt4GH encryption. This is done through an entry in the service info dictionary.
  2. When a user requests an access URL, it has to provide a re-encrypted version of the object pointed to by the access URL. This is done by retrieving the user's public key from the header, issuing a call to a reencrypt function, and returning a suitably modified access URL.
class Crypt4GHMiddleware:

  def post_getServiceInfo(self, service_info):
    server_pubkey = load_server_pubkey()
    service_info["crypt4gh"] = {
      "version" = "1.0",
      "server_pubkey" = server_pubkey,
    }

  def post_GetAccessURL(self, object_id, access_id, access_url):
    client_pubkey = request.headers.get("Crypt4Gh-Pubkey")
    crypt4gh_conf = getattr(current_app.config.foca, "crypt4gh", None)
    access_url = reencrypt(access_url, client_pubkey, crypt4gh_conf)
    return access_url

Note that the specific implementation is not subject to any standard, and is likely to change in the future.

uniqueg commented 3 months ago

Good stuff, thanks a lot.

Recently, I was looking into upgrading FOCA from Connexion 2 to Connexion 3 (https://connexion.readthedocs.io/en/latest/v3.html#migrating-from-connexion-2), and I stumbled across a possible alternative.

Connexion 3 is a major rewrite that is built on Starlette (instead of Flask) to migrate from WSGI to ASGI (though Flask is still supported via some WSGI-to-ASGI compatibility layer). Importantly, Connexion 3 now applies basically all its functionalities via a Starlette-based middleware stack: https://www.starlette.io/middleware/

So, given that FOCA is underlying basically all of our services and we are planning to migrate to Connexion 3 as soon as I manage to put in the time to do so (I had already started and finished the first migration to about 75% when the summer break hit me), we could also consider making use of Starlette middlwares.

Advantages:

Disadvantages:

I've put "highly generic" as both an advantage and disadvantage, because having a bit of structure might make development easier, or at least more consistent. However, a pre-defined structure can also make things more restrictive, especially because we can't foresee all use cases yet.

I guess what we really need to focus on is the design of a mechanism that checks when (and when not) a middleware applies. That way we don't need to write different methods for different operations, but rather include all middlewares in the stack and just make sure that each one only runs if the right conditions are met.

jvkersch commented 3 months ago

@uniqueg I agree that a pre-defined framework would be the better option. The main weakness with my proposal is that there's currently one 1 example, and a custom framework risks being premature.

I guess what we really need to focus on is the design of a mechanism that checks when (and when not) a middleware applies.

Yes! We can use the intervening time (until the migration to connexion 3 is complete) to figure this out.