ansible / django-ansible-base

Apache License 2.0
18 stars 45 forks source link

Fix `IntegrityError` when giving overlapping team permissions #401

Closed AlanCoding closed 5 months ago

AlanCoding commented 5 months ago

We hit another version of https://github.com/ansible/django-ansible-base/pull/281 from out in the wild.

The report of this is https://github.com/ansible/awx/issues/15185, and although it is short on specific of what went on, with some speculation I was able to mostly logically brute-force out the probable cause. The imagined reproducer:

The result might be a little race-y, but considering how long the transactions are open for I don't think it should be very difficult to hit. The reproducer is given here, which follows the same pattern as my linked prior fix. This is not possible to do in unit tests because of the reasons given there (checkpoints vs proper transactions).

from django.contrib.contenttypes.models import ContentType
from test_app.models import *
from ansible_base.rbac.models import *
from django.db import transaction

Process 1

org = Organization.objects.create(name='repro')
team = Team.objects.create(name='repro-new', organization=org)
inv = Inventory.objects.create(organization=org, name='for-repro')
rd = RoleDefinition.objects.create_from_permissions(name='inv-view', permissions=['view_inventory'], content_type=ContentType.objects.get_for_model(Inventory))

u = User.objects.last()
member_rd = RoleDefinition.objects.get(name='Team Member')
member_rd.give_permission(u, team)

transaction.atomic().__enter__()
assignment = rd.give_permission(team, inv)
# wait for process 2 to hang
transaction.atomic().__exit__(None, None, None)

Process 2

team = Team.objects.get(name='repro-new', organization__name='repro')
inv = Inventory.objects.get(name='for-repro', organization__name='repro')
rd2 = RoleDefinition.objects.create_from_permissions(name='inv-view-del', permissions=['view_inventory', 'delete_inventory'], content_type=ContentType.objects.get_for_model(Inventory))

with transaction.atomic():
    assignment = rd2.give_permission(team, inv)
    # will hang, return to process 1

After you do the work to run all this, you should see the IntegrityError happen in Process 2 after the last action in Process 1. The head of that:

IntegrityError                            Traceback (most recent call last)
Cell In[4], line 2
      1 with transaction.atomic():
----> 2         assignment = rd2.give_permission(team, inv)

File ~/repos/awx/testing/django-ansible-base/ansible_base/rbac/models.py:187, in RoleDefinition.give_permission(self, actor, content_object)
    186 def give_permission(self, actor, content_object):
--> 187     return self.give_or_remove_permission(actor, content_object, giving=True)

File ~/repos/awx/testing/django-ansible-base/ansible_base/rbac/models.py:248, in RoleDefinition.give_or_remove_permission(self, actor, content_object, giving, sync_action)
    245         to_update.remove(object_role)
    246     object_role.delete()
--> 248 update_after_assignment(update_teams, to_update)
    250 if not sync_action and self.name in permission_registry._trackers:
    251     tracker = permission_registry._trackers[self.name]

File ~/repos/awx/testing/django-ansible-base/ansible_base/rbac/triggers.py:86, in update_after_assignment(update_teams, to_update)
     83 if update_teams:
     84     compute_team_member_roles()
---> 86 compute_object_role_permissions(object_roles=to_update)

File ~/repos/awx/testing/django-ansible-base/ansible_base/rbac/caching.py:196, in compute_object_role_permissions(object_roles, types_prefetch)
    194         raise RuntimeError(f'Could not find a place in cache for {evaluation}')
    195 if to_add_int:
--> 196     RoleEvaluation.objects.bulk_create(to_add_int)
    197 if to_add_uuid:
    198     RoleEvaluationUUID.objects.bulk_create(to_add_uuid)

The mechanics here are that team gets the "view" permission to the inv object twice, in Process 1 and Process 2. Similar to the (imaginary) reproducer. Since neither process has committed their transaction, neither is aware of what the other has done, and they both seek to add this same permissions.

We do not want to list the same RoleEvaluation multiple times. The constraint is correct. We just don't really care very much if the record already exists, and is recently created. This is because the lookup fields are basically the same as all the fields. There is no way for the constraint to be violated, but still lose some other information. There is no other information. The caching logic wishes to assert we have some certain state, and we already have that state in the event of conflicts.

The error above is fixed by this patch, in case that wasn't clear.

sonarcloud[bot] commented 5 months ago

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
100.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud