MetaCell / cloud-harness

Other
14 stars 5 forks source link

Exploration Notes: findings on Open Policy Agent #781

Open dvcorreia opened 5 days ago

dvcorreia commented 5 days ago

Context

As part of addressing NGLASS-50, we initially explored using Open Policy Agent (OPA) to determine if users had the appropriate permissions when making requests to a specific DicomWeb store. Our setup involved using GoGatekeeper as an OIDC proxy, which natively supports OPA for authorization (documentation here). Given this integration, we found it worth to test OPA for our authorization needs.

[!NOTE]
The responsibility for handling these permissions checks was later shifted to the DicomWeb Proxy, and the exploration of OPA was deprecated.

This issue provides a summary of the work conducted during the evaluation of OPA. Some exploration code can be seen in https://github.com/MetaCell/mnp/tree/feature/NGLASS-50.

Open Policy Engine

It is a general-purpose policy engine. In it simplest form, we can think of it as a service that takes json as input, executes a policy written in Rego, and outputs the policy evaluation in json, that is, if is allowed of not.

image

How did it look like for our use case

Our use case was evaluating access to DicomWeb stores. So, from GoGatekeeper example in their documentation, it would send us the following input:

{
    "input": {
        "body": "{\"name\": \"test\"}",
        "headers": {
            "Authorization": "Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJtZXRhY2VsbC51cyIsInN1YiI6IkpvaG5zb24gJiBKb2huc29uIiwiYXVkIjpbInZpZXdlci5vaGlmLm1ldGFjZWxsLnVzIl0sImV4cCI6MTc4MjQ5MzM3NCwiaWF0IjoxNzE5NDIxMzc0LCJqdGkiOiIwM2RjMTFjYy1jOGFjLTRmNWItYWRjMC03Y2IwNGEyNzdlMTAiLCJncm91cHMiOlsiTWV0YUNlbGwiXX0.bkUqmrJVvCXu4R6JBRK719upUe8VR9SmIGtSMHjMbDE"
        },
        "host": "healthcare.googleapis.com",
        "protocol": "HTTP/2",
        "path": "/v1/projects/MetaCell/locations/us-east4/datasets/ohif-qa-dataset/dicomStores/ohif-qa-2/dicomWeb/studies/12345678",
        "remote_addr": "192.168.1.90",
        "method": "GET",
        "user_agent": "Firefox"
    }
}

How can we write a policy to work on this input? Here is the example we worked on:

package metacell.dicomweb.sources.authz

import rego.v1

# Only user of an organization can request DicomWeb stores
# from that organization.
# We default the authorization to false.
default allow := false

# "Allow" is the policy we want to evaluate.
# It only evaluates to true if either of its 2 definitions
# bellow evaluates to true.

# The first one checks two things:
allow if {
    # If the request is a MetaCell dicomweb request
    # (see: is_metacell_dicomweb_request bellow)
    is_metacell_dicomweb_request

    # Checks if the clinic is in the user's groups claim of its OIDC token.
    metacell_dicomweb_ctx.clinic in claims.groups
}

# The second one checks also two things:
allow if {
    # If the request is a Google dicomweb request
    # (see: is_google_dicomweb_request bellow)
    is_google_dicomweb_request

    # Checks if the project ID is in the user's groups claim.
    google_dicomweb_ctx.project in claims.groups
}

# All of the following code are just policies to extract
# or evaluate things we defined above.
# These can also be evaluated separately.

is_metacell_dicomweb_request if {
    glob.match("/clinic/*/store/*/dicomweb/**", ["/"], input.path)
}

metacell_dicomweb_ctx := {
    "clinic": clinic,
    "store": store,
} if {
    path_params := split(input.path, "/")
    clinic := path_params[2]
    store := path_params[4]
}

is_google_dicomweb_request if {
    # Google Healthcare API ref: https://cloud.google.com/healthcare-api/docs/how-tos/dicom
    glob.match("/v1/projects/*/locations/*/datasets/*/dicomStores/*/dicomWeb/**", ["/"], input.path)
}

google_dicomweb_ctx := {
    "project": project,
    "location": location,
    "dataset": dataset,
    "store": store,
} if {
    # Grab the Google Healthcare DicomWeb context from the URL path.
    path_params := split(input.path, "/")
    project := path_params[3]
    location := path_params[5]
    dataset := path_params[7]
    store := path_params[9]
}

claims := payload if {
    # We do not verify the signature on the Bearer token.
    # The gogatekeeper shoudl have already verified the token.
    # Either way, we can also verify it here.

    # This statement invokes the built-in function `io.jwt.decode` passing the
    # parsed bearer_token as a parameter. The `io.jwt.decode` function returns an
    # array:
    #
    #   [header, payload, signature]
    #
    # In Rego, you can pattern match values using the `=` and `:=` operators. This
    # example pattern matches on the result to obtain the JWT payload.
    [_, payload, _] := io.jwt.decode(bearer_token)
}

bearer_token := t if {
    # Bearer tokens are contained inside of the HTTP Authorization header. This rule
    # parses the header and extracts the Bearer token value. If no Bearer token is
    # provided, the `bearer_token` value is undefined.
    v := input.headers.Authorization
    startswith(v, "Bearer ")
    t := substring(v, count("Bearer "), -1)
}

We also wrote some test to validate this policy. They can be run with opa test. Here are a few tests for you to see how they look like:

package metacell.dicomweb.sources.authz_test

import rego.v1

import data.metacell.dicomweb.sources.authz

test_metacell_source_w_group_access if {
    authz.allow with input.path as "/clinic/MetaCell/store/1/dicomweb/studies/12345678"
        with io.jwt.decode as [
            true, {
                "iss": "metacell.us",
                "sub": "Johnson & Johnson",
                "aud": ["viewer.ohif.metacell.us"],
                "groups": ["MetaCell"],
            },
            {},
        ]
        with input.headers.Authorization as "Bearer "
}

test_metacell_source_no_group_access if {
    not authz.allow with input.path as "/clinic/MetaCell/store/1/dicomweb/studies/12345678"
        with io.jwt.decode as [
            true, {
                "iss": "metacell.us",
                "sub": "Johnson & Johnson",
                "aud": ["viewer.ohif.metacell.us"],
                "groups": [],
            },
            {},
        ]
        with input.headers.Authorization as "Bearer "
}

test_google_source_w_group_access if {
    authz.allow with input.path as "/v1/projects/MetaCell/locations/us-east4/datasets/ohif-qa-dataset/dicomStores/ohif-qa-2/dicomWeb/studies/12345678"
        with io.jwt.decode as [
            true, {
                "iss": "metacell.us",
                "sub": "Johnson & Johnson",
                "aud": ["viewer.ohif.metacell.us"],
                "groups": ["MetaCell"],
            },
            {},
        ]
        with input.headers.Authorization as "Bearer "
}

test_google_source_no_group_access if {
    not authz.allow with input.path as "/v1/projects/MetaCell/locations/us-east4/datasets/ohif-qa-dataset/dicomStores/ohif-qa-2/dicomWeb/studies/12345678"
        with io.jwt.decode as [
            true, {
                "iss": "metacell.us",
                "sub": "Johnson & Johnson",
                "aud": ["viewer.ohif.metacell.us"],
                "groups": [],
            },
            {},
        ]
        with input.headers.Authorization as "Bearer "
}

Use Cases

Being a general-purpose policy engine, it has been integrated in many things. A few I think that are interesting:

Application Authorization

You can decouple authorization from business logic, so if the policy changes we do not have to make changes to the code. It is much more flexible in that we can deploy the same product with different rules, e.g for a clinic A, employees can access all the medic records, but for clinic B that is only possible if the employee works the night shift from 22pm to 5am, for privacy and emergency reasons.

An example can be seen in https://www.openpolicyagent.org/ in the Application tab.

[!NOTE]
In Go, OPA can be embedded inside the program binary, so they can run inside the same process. In the future I think WASM support has being worked on. For other languages, it is usually run as a sidecar.

API Gateway Authorization

Envoy supports it out of the box and is really simple to iterate on it. Here is an example:

package envoy.authz

import rego.v1

# allow all GET requests to /pets
default allow := false

allow if {
    input.attributes.request.http.method == "GET"
    input.attributes.request.http.path == "/pets"
}

Kubernetes Policy Management

Similar to Kyverno, OPA has its own solution for this, called Gatekeeper. The interesting thing is the community library that allows you to easily define polices. Here is an example: ensure that all namespaces have a label describing which product their are part of.

After installing Gatekeeper and the CRD definitions for the library, if you apply the following CRD policy "definition" in the cluster:

apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredLabels
metadata:
  name: all-must-have-product
spec:
  match:
    kinds:
      - apiGroups: [""]
        kinds: ["Namespace"]
  parameters:
    message: "All namespaces must have an `product` label that points to your product (options: celegans, neuroglass, cloudharness). See reference at bla bla."
    labels:
      - key: product
        allowedRegex: "^(celegans|neuroglass|cloudharness)$"

Creating the following namespace will be allowed:

apiVersion: v1
kind: Namespace
metadata:
  name: allowed-namespace
  labels:
    team: celegans

But this one will not be:

apiVersion: v1
kind: Namespace
metadata:
  name: disallowed-namespace
  labels:
    team: invalid

There are many more things that is possible to do with OPA, but this what the usecases I'm experienced with.

dvcorreia commented 5 days ago

I tried to run a demo with OPA and GoGatekeeper, but could not get it to run. The demo docker compose is very out of date. I tried to get it fixed in https://github.com/dvcorreia/demo-docker-compose but then NGLASS-50 was paused and deprecated.