Permify / permify

An open-source authorization as a service inspired by Google Zanzibar, designed to build and manage fine-grained and scalable authorization systems for any application.
https://docs.permify.co/
Apache License 2.0
4.46k stars 198 forks source link

[BUG] Garbage Collector removing relation still valid #1471

Open gign0766 opened 3 weeks ago

gign0766 commented 3 weeks ago

Describe the bug When a permission is present twice, one valid and the other expired, the Garbage Collector invalidate the relation in spite of relation their is still valid one

To Reproduce Steps to reproduce the behavior:

Setup a Permify with a PostgreSQL database, configure the garbage collector as following :

database:
  engine: postgres
  uri: 
  garbage_collection:
     enabled: true
     interval: 10m
     window: 5m
     timeout: 5m

Add a relation, then add the same relation (for the same tenant) a bit later

For example : image

The permission check work until the garbage collector run, after that, the permission is considered as invalid

Expected behavior The expired relation should be removed from the database, but the permission check should still be allowed since there is a valid relation.

Additional context The permify instance run with the helm chart as well as the postgres instance

Environment :

ucatbas commented 3 weeks ago

Hi @gign0766, I tried to reproduce the issue but couldn't replicate it. Everything seems to be working as expected on my end. I'll update the database tests shortly to include this scenario. Also, can you provide the specific permission check requests you used to ensure I don't miss any details?

gign0766 commented 3 weeks ago

Hi, I couldn't reproduce the problem outside of the helm version. When I tryed to recreate it with a docker environment everything work as expected.

The permission check was : POST http://localhost:13476/v1/tenants/4849b21a-df59-45b2-ab1e-e05bb1db8f28/permissions/check

{
  "metadata": {
    "schema_version": "",
    "snap_token": "",
    "depth": 20
  },
  "entity": {
    "type": "organisation",
    "id": "1"
  },
  "permission": "is_member",
  "subject": {
    "type": "user",
    "id": "b56661f8-7be6-4342-a4c0-918ee04e5983"
  }
}

The test was made though the Go SDK :

cr, err := client.Permission.Check(ctx, &permifyPayload.PermissionCheckRequest{
        TenantId: tenantId,
        Metadata: &permifyPayload.PermissionCheckRequestMetadata{
            SnapToken:     "",
            SchemaVersion: "",
            Depth:         20,
        },
        Entity:     tuple.Entity,
        Permission: tuple.Relation,
        Subject:    tuple.Subject,
    })

and with the following schema :

entity user {}

entity organisation {
    relation admin @user
    relation member @user
    relation billing @user
    relation guest @user

    permission is_member = admin or member or billing or guest
}

// Billing
entity billing {
    relation admin @organisation#admin
    relation billing @organisation#billing

    permission is_member = admin or billing

    permission full_access = is_member
    permission read_only = is_member
}

// Project
entity project {
    relation admin @organisation#admin
    relation member @organisation#member

    permission is_member = admin or member

    permission full_access = admin
    permission read_only = is_member
}

entity instance {
    relation project @project

    permission full_access = project.full_access
    permission read_only = project.read_only
}
gign0766 commented 3 weeks ago

To add some information on the environment : It's a Kubernetes Cluster with 3 worker nodes. Permify helm as the default replicas number (so 3), so each worker as one permify instance.

gign0766 commented 2 weeks ago

Hello,

I just notice there is a distributed configuration for the cache system. Could this be the cause of the problem ?

I'll try to configure it on the cluster and keep you inform of the result

ucatbas commented 2 weeks ago

Hi, thank you @gign0766. I couldn't test with your setup. This week I will be updating helm charts, related migration scripts and issues about k8s. I will test both cases!

tolgaOzen commented 2 weeks ago

Hi @gign0766 , the issue might be related to the version of Postgres you're using. Permify supports Postgres 13.8 and above.

cc: @ucatbas

gign0766 commented 2 weeks ago

Hi @tolgaOzen My bad, I forgot to write that was the bitnami helm chart version The postgres version is 15.3 Image : docker.io/bitnami/postgresql:15.3.0-debian-11-r17

So far, since I've activated the distributed system, the problem hasn't shown up again.