The report of this is https://github.com/ansible/awx/issues/15185, and although it is short on specific of what went on, with some speculation I was able to mostly logically brute-force out the probable cause. The imagined reproducer:
In the "old" AWX UI, go to give a team permission to 2 different roles to the same thing, like inventory update and adhoc roles.
It is assumed that this team has members already.
Submit the requests to assign those permissions concurrently.
The result might be a little race-y, but considering how long the transactions are open for I don't think it should be very difficult to hit. The reproducer is given here, which follows the same pattern as my linked prior fix. This is not possible to do in unit tests because of the reasons given there (checkpoints vs proper transactions).
from django.contrib.contenttypes.models import ContentType
from test_app.models import *
from ansible_base.rbac.models import *
from django.db import transaction
Process 1
org = Organization.objects.create(name='repro')
team = Team.objects.create(name='repro-new', organization=org)
inv = Inventory.objects.create(organization=org, name='for-repro')
rd = RoleDefinition.objects.create_from_permissions(name='inv-view', permissions=['view_inventory'], content_type=ContentType.objects.get_for_model(Inventory))
u = User.objects.last()
member_rd = RoleDefinition.objects.get(name='Team Member')
member_rd.give_permission(u, team)
transaction.atomic().__enter__()
assignment = rd.give_permission(team, inv)
# wait for process 2 to hang
transaction.atomic().__exit__(None, None, None)
Process 2
team = Team.objects.get(name='repro-new', organization__name='repro')
inv = Inventory.objects.get(name='for-repro', organization__name='repro')
rd2 = RoleDefinition.objects.create_from_permissions(name='inv-view-del', permissions=['view_inventory', 'delete_inventory'], content_type=ContentType.objects.get_for_model(Inventory))
with transaction.atomic():
assignment = rd2.give_permission(team, inv)
# will hang, return to process 1
After you do the work to run all this, you should see the IntegrityError happen in Process 2 after the last action in Process 1. The head of that:
IntegrityError Traceback (most recent call last)
Cell In[4], line 2
1 with transaction.atomic():
----> 2 assignment = rd2.give_permission(team, inv)
File ~/repos/awx/testing/django-ansible-base/ansible_base/rbac/models.py:187, in RoleDefinition.give_permission(self, actor, content_object)
186 def give_permission(self, actor, content_object):
--> 187 return self.give_or_remove_permission(actor, content_object, giving=True)
File ~/repos/awx/testing/django-ansible-base/ansible_base/rbac/models.py:248, in RoleDefinition.give_or_remove_permission(self, actor, content_object, giving, sync_action)
245 to_update.remove(object_role)
246 object_role.delete()
--> 248 update_after_assignment(update_teams, to_update)
250 if not sync_action and self.name in permission_registry._trackers:
251 tracker = permission_registry._trackers[self.name]
File ~/repos/awx/testing/django-ansible-base/ansible_base/rbac/triggers.py:86, in update_after_assignment(update_teams, to_update)
83 if update_teams:
84 compute_team_member_roles()
---> 86 compute_object_role_permissions(object_roles=to_update)
File ~/repos/awx/testing/django-ansible-base/ansible_base/rbac/caching.py:196, in compute_object_role_permissions(object_roles, types_prefetch)
194 raise RuntimeError(f'Could not find a place in cache for {evaluation}')
195 if to_add_int:
--> 196 RoleEvaluation.objects.bulk_create(to_add_int)
197 if to_add_uuid:
198 RoleEvaluationUUID.objects.bulk_create(to_add_uuid)
The mechanics here are that team gets the "view" permission to the inv object twice, in Process 1 and Process 2. Similar to the (imaginary) reproducer. Since neither process has committed their transaction, neither is aware of what the other has done, and they both seek to add this same permissions.
We do not want to list the same RoleEvaluation multiple times. The constraint is correct. We just don't really care very much if the record already exists, and is recently created. This is because the lookup fields are basically the same as all the fields. There is no way for the constraint to be violated, but still lose some other information. There is no other information. The caching logic wishes to assert we have some certain state, and we already have that state in the event of conflicts.
The error above is fixed by this patch, in case that wasn't clear.
We hit another version of https://github.com/ansible/django-ansible-base/pull/281 from out in the wild.
The report of this is https://github.com/ansible/awx/issues/15185, and although it is short on specific of what went on, with some speculation I was able to mostly logically brute-force out the probable cause. The imagined reproducer:
The result might be a little race-y, but considering how long the transactions are open for I don't think it should be very difficult to hit. The reproducer is given here, which follows the same pattern as my linked prior fix. This is not possible to do in unit tests because of the reasons given there (checkpoints vs proper transactions).
Process 1
Process 2
After you do the work to run all this, you should see the
IntegrityError
happen in Process 2 after the last action in Process 1. The head of that:The mechanics here are that
team
gets the "view" permission to theinv
object twice, in Process 1 and Process 2. Similar to the (imaginary) reproducer. Since neither process has committed their transaction, neither is aware of what the other has done, and they both seek to add this same permissions.We do not want to list the same
RoleEvaluation
multiple times. The constraint is correct. We just don't really care very much if the record already exists, and is recently created. This is because the lookup fields are basically the same as all the fields. There is no way for the constraint to be violated, but still lose some other information. There is no other information. The caching logic wishes to assert we have some certain state, and we already have that state in the event of conflicts.The error above is fixed by this patch, in case that wasn't clear.