Closed etianen closed 3 years ago
See the discussion taking place on the core django bug tracker:
https://code.djangoproject.com/ticket/27477
A similar issue exists in the django admin and class-based generic views.
Any news? Taking a look at the django bug linked by @etianen I belive we can achieve the solution in the same way:
In my opinion 2 or 3 is the correct approach.
I can do a pull request implementing any of these solutions. Do people have a preference or onother approach?
@mcanaves I'd say we have to consider this as "On-Hold". There's been no progress on the upstream issue and I'm very much inclined to see what the answer is there before implementing something possibly different here.
In the meantime I'd advise implementing your own UpdateMixin if this is an actual issue for you.
I'm not going to close this but I am un-milestoning for now.
@mcanaves If you're keen to contribute on this, I think the best approach would be to champion the Django issue. Looks like it needs a discussion on Django-Dev. These things always benefit from someone being prepared to Hustle. 🙂
@mcanaves I originally raised this issue, plus the one on Django, but ran out of time to push it to completion. Please do continue the efforts in my place, if it's important to you. :)
I'm not against us resolving this for REST framework's generic views, even if the same issue persists in Django's generic views. I'm not even that fussed about making it a setting if it goes into a major point release, and the behavior is clearly documented. (We can/could suggest eg. a mixin class for backward compatibility concerns, or else have call select_for_update
determined via a should_select_for_update
hook or similar.)
What it would need at this point is:
See also #2648.
And #5674
We have been bitten hard by this, @jarekwg has a fixed this inline but it would be great to not have to do this on every endpoint by moving the solution into DRF.
Looking forward a PR to review it.
Heh yep, I overrode the update
method, using select_for_update
(prepare to cringe):
def update(self, request, *args, **kwargs):
partial = kwargs.pop('partial', False)
with transaction.atomic():
# Call get_object, but don't use it. This ensures any permission exceptions or otherwise
# will still get raised despite us grabbing the instance via select_for_update.
self.get_object()
instance = self.get_queryset().select_for_update().get(pk=kwargs['pk'])
# Regrettably, since we're not calling `get_object` ourselves now, need to re-add all the
# bits the mixins for `get_object` add..
instance.tracker.geolocation = request.META.get('HTTP_GEOLOCATION')
if hasattr(instance, 'tracker') and request.user.is_authenticated:
instance.tracker.current_user = request.user
# END REGRET
serializer = self.get_serializer(instance, data=request.data, partial=partial)
serializer.is_valid(raise_exception=True)
self.perform_update(serializer)
if getattr(instance, '_prefetched_objects_cache', None):
# If 'prefetch_related' has been applied to a queryset, we need to
# forcibly invalidate the prefetch cache on the instance.
instance._prefetched_objects_cache = {}
Simply put, the fix means we lose out on the niceness get_object
offers, and are forced to have to re-implement its functionality...
I like the the suggestion mentioned somewhere earlier with passing for_update
into get_object
.
I'd also like the solution to be baked into DRF, without it having to be explicitly enabled.
The largest (albeit not only) issue is the race condition with partial_update
, which can revert a record's data, even if one of the requests sends an empty payload to PATCH.. Perhaps the update_fields
param django offers insave
should be leveraged too..
I'll have a crack at offering a clean PR for this after reading all of the earlier comments more thoroughly..
e: Found a slightly nicer implementation, though would still be nicer in DRF itself..:
def update(self, request, *args, **kwargs):
partial = kwargs.pop('partial', False)
with transaction.atomic():
# Lock row using select_for update, but actually use the instance from get_object,
# so that we don't lose any mixin goodies. This results in TWO db queries, which we've
# decide to just live with for now.
InspectionItem.objects.select_for_update().get(pk=kwargs['pk'])
instance = self.get_object()
serializer = self.get_serializer(instance, data=request.data, partial=partial)
serializer.is_valid(raise_exception=True)
self.perform_update(serializer)
if getattr(instance, '_prefetched_objects_cache', None):
# If 'prefetch_related' has been applied to a queryset, we need to
# forcibly invalidate the prefetch cache on the instance.
instance._prefetched_objects_cache = {}
@jarekwg using the update_fields parameter has been suggested multiple times. #2648 #5674
select_for_update is a more heavy weight approach that would eliminate the need for update_fields. In my case however the race condition was with different users whose updates could only be disjoint sets so update_fields was the ideal solution to resolve the race condition without incurring the overhead of locking the database row.
I don't really see the downside of always utilizing update_fields and while I'm not super familiar with Django's ORM layer I'd assume if anything it'd be a slight performance improvement.
As a heavy user of this wonderful framework, my own 2p regarding these issues:
1) Unless there's a downside to update_fields, always use it. Edit: From previous discussion the only downside to this seems to be if users have overridden the save functionality on models and didn't implement to spec. 2) Provide an atomic update mixin for serializers that calls select_for_update. Locking the db row incurs a performance penalty so it probably shouldn't be enabled by default.
These issues aren't hard to workaround for end users, but it does take some digging around in the source code to figure out the best approach to implement them. So at least some documentation on the best solution from the perspective of the DRF authors might be in order.
A PATCH/partial update should only update the fields it touches by default, and update_fields
is the way to do it.
The suggestion around select_for_update
is a broader topic that is related but can be addressed separately. It would be a good feature but arguably not default behaviour.
As pointed by @litchfield and in #5897, adding the update_fields to save would prevent many of this problems. Quoting @tleewu from #5897 :
def update(self, instance, validated_data):
raise_errors_on_nested_writes('update', self, validated_data)
info = model_meta.get_field_info(instance)
update_fields = []
# Simply set each attribute on the instance, and then save it.
# Note that unlike `.create()` we don't need to treat many-to-many
# relationships as being a special case. During updates we already
# have an instance pk for the relationships to be associated with.
for attr, value in validated_data.items():
update_fields.append(attr)
if attr in info.relations and info.relations[attr].to_many:
field = getattr(instance, attr)
field.set(value)
else:
setattr(instance, attr, value)
instance.save(update_fields=update_fields)
return instance
There is a race condition in PUT and PATCH operations where a user can edit a model instance at the same time as another user. Consider the following two operations occurring in parallel:
{"is_staff": True}
User.objects.update(is_superuser=True)
If (1) load the model instance, then (2) runs, then (1) saves the model instance, the update in (2) will be lost.
The solution is to call
select_for_update
on the queryset in theupdate()
method ofUpdateModelMixin
.Checklist
master
branch of Django REST framework.