Open bemoody opened 11 months ago
physiotag-1.0.0 has two references with order=4
Current text on the page is
3. Clifford GD, Silva I, Moody B, Li Q, Kella D, Shahin A, Kooistra T, Perry D, Mark RG. The PhysioNet/computing in cardiology challenge 2015: reducing false arrhythmia alarms in the ICU. In 2015 Computing in Cardiology Conference (CinC) 2015 Sep 6 (pp. 273-276). IEEE.
4. Xie C, McCullum L, Johnson A, Pollard T, Gow B, Moody B. Waveform Database Software Package (WFDB) for Python. PhysioNet 2022;URL https://doi.org/10. 13026/MMPM-2V55.
but there is no citation of [3]
and the [4]
is referring to the CinC paper.
@tompollard @li-lcp
This is the issue I mentioned earlier today.
Notice that NewProjectVersionForm does not set order when creating References. That's a serious problem, and would have been apparent if the Reference class forced the order to be unique and non-null.
Also notice that /projects/X/preview/ (and /content/X/) is sorted by order
, while /projects/X/content/ is apparently sorted by id
. (See BaseModelFormSet.get_queryset in django/forms/models.py)
Issue: References shuffled and incorrectly displayed on the Project Page. (Not sure this is the same issue as the one reported above.)
More details:
The order of references look correct in the References section in "copy-edit" interface. However, References listed on Project Page seems to have been randomly shuffled for the following project.
A Brazilian Multilabel Ophthalmological Dataset (BRSET) https://www.physionet.org/projects/NFgVDYAzy5iv1Z5Sfvy6/preview/?Admin=True
Is there a way to fix this? Thanks
I did start trying to fix this but I didn't see an easy way to address the existing broken projects.
There is no way for us to know automatically whether the list of references is correct - we have to manually review them.
Of course, if two different versions of the same project have exactly the same references in a different order, we can guess that one of them is wrong - but we don't know which one is wrong, or whether the author noticed it was wrong and tried to fix it. If the author did try to fix it, we don't know whether they did so based on the numbers shown in the "Content" page or the numbers shown in the "Preview" page.
So I'm sorry - we should do something ASAP to stop creating more damage every time a new version is created. But that won't help with all of the existing projects in the system that need to be fixed manually.
Okay, I'm guessing I know what happened with chexchonet 1.0.0
, mimic-iv-fhir 1.0
, and physiotag 1.0.0
.
It looks like edit_content_item
does not change order of existing references when it deletes a reference.
I also didn't understand, until I saw Li-wei's example this morning, that ReferenceFormSet doesn't actually "save" all of the objects; presumably "save" \~cleverly\~ applies only to the objects that contain a modified form field.
So physiotag
probably had refs 1, 2, 3, 4; then somebody deleted the third one (making it 1, 2, 4); then somebody added two more (making it 1, 2, 4, 4; then 1, 2, 4, 4, 5.)
Current plan:
Make a data migration to fix 75% of the existing broken projects.
Ensure that projects with undefined/inconsistent reference order cannot be submitted/published.
If project has undefined/inconsistent reference order then add a "confirm order" button to the content or copyedit page.
Fix ReferenceFormSet so that new references won't collide with existing references.
Don't bother trying to require 'order' to match the index. Just require 'order' to be monotonic.
Manually repair the remaining dozen-or-so broken published projects.
Leave remaining unpublished projects for somebody else to worry about.
For what it's worth, I have an intense and abiding dislike of the hand-edited reference list, and my long-term plan is to completely eliminate it for new projects.
[('description', 'project')]
: forbid duplicate descriptions as was originally intended; ignore order.id
order in the preview page, to match the content page and reduce confusion.
Originally, the
Reference
andPublishedReference
objects had no explicit order, which was a mess since it led to references not being displayed in the intended order.There was, however, a database-level constraint that no project could have two references with the same text content ("description".)
All of this was changed through a series of pull requests (#1602, #1817, #1819) until arriving at the current behavior:
Let's understand what this is saying. For
PublishedReference
, the above code is saying that there can be no two references with the same description and the same project and the same order.There can, however, be:
For
Reference
, it's saying that there can be no two references with the same description, same object_id, and the same order.There can, however, be:
And there cannot be two references with the same description, same order, and different projects that happen to have the same object_id (e.g., ActiveProject number 1 and ArchivedProject number 1.)
This is of course nonsense and not what we want. What we want is:
or else:
because I'm not sure it was ever all that helpful to prevent duplicate references at the database level.
We also, I think, want to make the
order
field non-nullable.Note that currently on PhysioNet, there is one published project that contains duplicate
order
s (physiotag-1.0.0
has two references withorder=4
) as well as one active project (RPv08rcu97FdkJhIft3f
has two references withorder=9
), and a few other projects with unusual order sequences.