Multiple conflicting "view_event_menu" permissions

bemoody commented 1 year ago

Currently, in the live and staging systems, there are two django.contrib.auth.models.Permission objects that both have codename = "view_event_menu". One that refers to the current events.models.Event, and one that refers to the now-deleted user.models.Event.

This results in a server error when trying to join an event, as @danamouk and @xborrat discovered.

Presumably we could go into the database and delete the obsolete permission object by hand, but this is weird; I don't understand why the object is there and wasn't deleted automatically by Django when applying the 0049_auto_20230112_1430 migration.

Running resetdb and loaddemo from scratch - either using sqlite or postgresql - seems to result in only one permission object.

bemoody commented 1 year ago

A more robust way to write it would be something like...

Permission.objects.get(codename="view_event_menu",
                       content_type__app_label="events")

or else...

Permission.objects.get(codename="view_event_menu",
                       content_type=ContentType.objects.get_for_model(Event))

It seems that Django generally expects permission names to be unique within an app; most functions that handle permissions expect you to specify the name as a string "{app_label}.{codename}". Our User.get_users_with_permission should probably do the same thing.

At a guess, maybe the obsolete permission wasn't deleted because there were existing users with that permission? Should test this. It could be a problem if we think we've deleted a permission but it's still lingering in the production database.

bemoody commented 1 year ago

At a guess, maybe the obsolete permission wasn't deleted because there were existing users with that permission?

Just tried installing d6d195aaa27da54d14f31e7c59f3eef004537cd3 and pushing 3fe37a8e081100965a2286ebf616952d727c6abe. I get:

physionet=> select * from auth_permission as p inner join django_content_type as c on p.content_type_id = c.id where p.codename = 'view_event_menu';
 id |               name                | content_type_id |    codename     | id | app_label | model 
----+-----------------------------------+-----------------+-----------------+----+-----------+-------
 42 | Can view event menu in the navbar |              25 | view_event_menu | 25 | user      | event
 67 | Can view event menu in the navbar |              66 | view_event_menu | 66 | events    | event
(2 rows)

physionet=> select * from auth_group_permissions as a inner join auth_group as g on a.group_id = g.id where permission_id in (42, 67);
 id | group_id | permission_id | id | name 
----+----------+---------------+----+------
 42 |        7 |            42 |  7 | Host
(1 row)

physionet=> select * from user_user_user_permissions where permission_id in (42, 67);
 id | user_id | permission_id 
----+---------+---------------
(0 rows)

So it's not that there were users with the permission in question but there was a group with that permission.

If dealing with this situtation in the future it should preferably be handled by a data migration. I still don't see any easy way we could have detected the error (the migration didn't give any warning messages or anything...)

bemoody commented 1 year ago

So it's not that there were users with the permission in question but there was a group with that permission.

No luck, same thing happens even if I delete the "Host" group before the second push. Which I guess is nice for consistency, but yeesh. Does Django never actually delete permissions?

tompollard commented 1 year ago

There is a remove-stale-contenttypes management command that does some cleaning of permissions: https://docs.djangoproject.com/en/4.1/ref/django-admin/#remove-stale-contenttypes

(physionet) python manage.py remove_stale_contenttypes
Some content types in your database are stale and can be deleted.
Any objects that depend on these content types will also be deleted.
The content types and dependent objects that would be deleted are:

    - Content type for user.event
    - 6 auth.Permission object(s)
    - 8 auth.Group_permissions object(s)
    - 2 user.User_user_permissions object(s)
    - Content type for user.eventparticipant
    - 4 auth.Permission object(s)
    - 4 auth.Group_permissions object(s)

This list doesn't include any cascade deletions to data outside of Django's
models (uncommon).

Are you sure you want to delete these content types?
If you're unsure, answer 'no'.

bemoody commented 1 year ago

Ah!

And migrate -v2 makes it clear that content types are only created after running the entire batch of migrations, not after each migration.

Seems like it might be wise to invoke remove_stale_contenttypes in post-receive. I can sort of understand why this isn't done automatically, but... hard to see how this could really break things?

bemoody commented 1 year ago

https://code.djangoproject.com/ticket/24865#comment:13 suggests that this was originally done automatically.

tompollard commented 1 year ago

Seems like it might be wise to invoke remove_stale_contenttypes in post-receive. I can sort of understand why this isn't done automatically, but... hard to see how this could really break things?

Adding to a post-receive hook seems like it would be okay to me, though I'm not sure I full understand the risks (particularly this comment: "I think that the other side of not removing stale content types for fear of data loss is leaving stale GenericForeignKeys, which will break when accessed").

bemoody commented 1 year ago

Yeah, automatically deleting data is kind of scary.

I've said before that we shouldn't be using generic foreign keys in the ways we're currently using them, and for example (in project/modelcomponents/authors.py):

    content_type = models.ForeignKey(ContentType, on_delete=models.CASCADE)
    object_id = models.PositiveIntegerField()
    project = GenericForeignKey('content_type', 'object_id')

it would probably be wiser and safer to use on_delete=models.PROTECT.

That said, we need to carefully consider what will happen to old data every time we delete or rename models. So is it any worse to automatically purge old ContentTypes?

I actually don't know how HDN / @alistairewj is handling upgrades/migrations. Is it just docker/dev-entrypoint.sh?

alistairewj commented 1 year ago

For development, yes, we use dev-entrypoint.sh which will run the migrations on docker-compose up. For production, we have a cron job we manually trigger when we redeploy (the job is essentially /code/physionet-django/manage.py migrate with appropriate env vars set).

bemoody commented 1 year ago

I wonder if the safest course of action would be to explicitly delete old content types / permissions through a data migration. I don't know if we can generate that code automatically but it shouldn't be that hard, right?

MIT-LCP / physionet-build

Multiple conflicting "view_event_menu" permissions #1818