fix: Resolve missing options in DOI approval form

alukach commented 1 year ago

What I'm changing

This PR fixes #513 wherein certain records weren't showing up on the DOI approval view's multiselect inputs.

The issue appears to have been caused by changes introduced in #405, where the query logic around generating records for the multi-select was altered. I can not tell why these form-logic changes were made as they do not seem to relate to the issue that #405 was addressing (breadcrumb link generation). Nonetheless, the PR altered the way in which we retrieve a queryset of drafts to be used when rendering the <select> element for foreign key fields on model forms. The update alters the logic to retrieve the latest update:

changes = (
    models.Change.objects.of_type(dest_model)
    .exclude(status=models.Change.Statuses.IN_TRASH)
    .annotate(effective_uuid=Coalesce("model_instance_uuid", "uuid"))
    .order_by("effective_uuid", "-updated_at")
)

latest_change_uuids = []
for u, changes in groupby(changes, lambda change: change.effective_uuid):
    latest_change_uuids.append(list(changes)[0].uuid)

return (
    ChangeWithIdentifier("short_name")
    .objects.filter(uuid__in=latest_change_uuids)
    .annotate_from_published(dest_model, to_attr="short_name", identifier=identifier_field)
    ...
    .annotate(effective_uuid=Coalesce("model_instance_uuid", "uuid"))
)

There are a few issues with this strategy (mentioned below in the "How I Did It" section), however the major issue that created this bug is that we are returning a queryset of records with uuid attributes that don't necessarily match the canonical UUID of the real-world concept. We attempted to account for this by annotating the queryset with an effective_uuid attribute and customizing the iterator of the ChangeChoiceField with an iterator that makes use of the effective_uuid attribute:

https://github.com/NASA-IMPACT/admg-backend/blob/d475bfe04131c18bf0cde5d66bda0e51935e7bd6/admin_ui/fields.py#L131-L144

However, we neglected to customize the iterator on the the ChangeMultipleChoiceField, which is used by the DOI approval view:

https://github.com/NASA-IMPACT/admg-backend/blob/d475bfe04131c18bf0cde5d66bda0e51935e7bd6/admin_ui/fields.py#L123-L128

Because of this, the <option> elements rendered within the multiple choice <select> point to the UUID of the latest drafts, not the canonical UUIDs of the real-world concept. As such, the UUID set on a given DOI does not match any of the options in the select dropdown and no selected attribute is rendered.

How I did it

The current strategy of retrieving the most recent drafts has a few of issues:

We are presumably retrieving the most recent drafts to get the most up-to-date short_name values. However, on the final returned queryset, we run .annotate_from_published(..., to_attr="short_name"), meaning that we are always pulling the short_name attribute from the published record. If there is no published record, then the logic to get the most recent draft is superfluous because we would only have a single unpublished action=CREATE draft.
The logic does not account for situations where a published record is later deleted. We do have logic to exclude anything where status=models.Change.Statuses.IN_TRASH, but that only applies to discarding the working draft, not deleting a published record.
It makes the assumption that the most up-to-date draft will have a short_name value. This is something I'm a bit hazy on, but I believe that there was a time (possibly now?) that the update attribute of an action=UPDATE draft only contained the diff of what was being changed on a published record. On a clone of the staging database, we can run the following to see that there are a number of drafts that do not have a update -> 'short_name' value:
```
In [49]: Change.objects.of_type(Campaign, Platform, Deployment, Instrument).exclude(update__has_key='short_name').values('action').annotate(total=Count('action'))
Out[49]: <ChangeQuerySet [{'action': 'Delete', 'total': 2}, {'action': 'Update', 'total': 343}]>
```
The performance is typically low when we are running a query, parsing the results in Python, and then using those results to run a new query. There's likely room for a refactor to improve this by performing the grouping on the DB.

There could have been an approach to continue to use the query as currently written in get_queryset_for_model() and to update our ChangeMultipleChoiceField to descend from ModelMultipleChoiceField rather than the current MultipleChoiceField. However, for the reasons described above, I felt it was better to simply back-out the changes introduced in #405. I also attempted to add more documentation, as this is a particularly dense portion of our codebase and can be challenging to understand.

Outcome

This resolved the issue and improves the load time by about 65-80% (at least when running in slow DEBUG mode).

Before

http://localhost:8001/campaigns/fb60a9a9-d127-403e-acc7-9203346d48fe/doi-approval

http://localhost:8001/campaigns/cbfb7dff-400d-4639-ba2d-431b00ae8971/doi-approval

After

http://localhost:8001/campaigns/fb60a9a9-d127-403e-acc7-9203346d48fe/doi-approval

form queries

http://localhost:8001/campaigns/cbfb7dff-400d-4639-ba2d-431b00ae8971/doi-approval

form queries

How you can test it

For testing, it is recommended to have a dev database matching the staging database. To achieve this:

Run docker-compose setup in codebase
Stop webserver service: docker stop <WEBSERVER_PROCESS_ID>
Jump into db services: docker exec -it <DB_PROCESS_ID> bash
Drop dev db: dropdb -p 5432 -U user admg_webapp
Create empty dev db: createdb -p 5432 -U user admg_webapp
Populate db: pg_dump -h admgtestdb.ccsbdjpqglnr.us-east-1.rds.amazonaws.com -U postgres prod_clone_20230523 | psql postgres://user:password@localhost/admg_webapp
Start docker webserver service: I stop docker compose and restart it

The following URLs demonstrate the issue:

Questions

@edkeeble do you recall why you changed fields.py in #405? To my understanding, that logic has nothing to do with breadcrumb links (ie it is purely focused on form fields).

alukach commented 1 year ago

I guess there's no way to do Draft PRs right now in this org 🤷 . Going to keep open but note that it's not ready.

edkeeble commented 1 year ago

It was quite a while ago, so I don't remember exactly, but I would guess it was because the change in the UI was meant to reflect the Campaign Draft related to the Deployment Draft not only in the breadcrumbs, but throughout that view (hence the addition of links within the form to the Published Campaign and Latest Draft). My understanding at the time, I think, was that the related campaign should point to the latest draft for that campaign, since we still supported multiple drafts. I expect the issues you're correcting here are the result of my misunderstanding the underlying data models, though it's odd that it's only coming up 9 months after those changes were committed.

NASA-IMPACT / admg-backend