MIT-LCP / physionet-build

The new PhysioNet platform.
https://physionet.org/
BSD 3-Clause "New" or "Revised" License
56 stars 20 forks source link

Reference order field is not unique per project #2137

Open bemoody opened 11 months ago

bemoody commented 11 months ago

Originally, the Reference and PublishedReference objects had no explicit order, which was a mess since it led to references not being displayed in the intended order.

There was, however, a database-level constraint that no project could have two references with the same text content ("description".)

All of this was changed through a series of pull requests (#1602, #1817, #1819) until arriving at the current behavior:

class Reference(models.Model):
    class Meta:
        unique_together = (('description', 'object_id', 'order'),)

class PublishedReference(models.Model):
    class Meta:
        unique_together = (('description', 'project', 'order'))

Let's understand what this is saying. For PublishedReference, the above code is saying that there can be no two references with the same description and the same project and the same order.

There can, however, be:

For Reference, it's saying that there can be no two references with the same description, same object_id, and the same order.

There can, however, be:

And there cannot be two references with the same description, same order, and different projects that happen to have the same object_id (e.g., ActiveProject number 1 and ArchivedProject number 1.)

This is of course nonsense and not what we want. What we want is:

class Reference(models.Model):
    class Meta:
        unique_together = (
            ('content_type', 'object_id', 'description'),
            ('content_type', 'object_id', 'order'),
        )

class PublishedReference(models.Model):
    class Meta:
        unique_together = (
            ('project', 'description'),
            ('project', 'order'),
        )

or else:

class Reference(models.Model):
    class Meta:
        unique_together = (
            ('content_type', 'object_id', 'order'),
        )

class PublishedReference(models.Model):
    class Meta:
        unique_together = (
            ('project', 'order'),
        )

because I'm not sure it was ever all that helpful to prevent duplicate references at the database level.

We also, I think, want to make the order field non-nullable.

Note that currently on PhysioNet, there is one published project that contains duplicate orders (physiotag-1.0.0 has two references with order=4) as well as one active project (RPv08rcu97FdkJhIft3f has two references with order=9), and a few other projects with unusual order sequences.

for p in project.models.PublishedProject.objects.all():
  s = [r1.order for r1 in p.references.order_by('order')]
  if s != list(range(1, len(s)+1)):
    print(p.slug, p.version, s)
bemoody commented 11 months ago

physiotag-1.0.0 has two references with order=4

Current text on the page is

3. Clifford GD, Silva I, Moody B, Li Q, Kella D, Shahin A, Kooistra T, Perry D, Mark RG. The PhysioNet/computing in cardiology challenge 2015: reducing false arrhythmia alarms in the ICU. In 2015 Computing in Cardiology Conference (CinC) 2015 Sep 6 (pp. 273-276). IEEE.

4. Xie C, McCullum L, Johnson A, Pollard T, Gow B, Moody B. Waveform Database Software Package (WFDB) for Python. PhysioNet 2022;URL https://doi.org/10. 13026/MMPM-2V55.

but there is no citation of [3] and the [4] is referring to the CinC paper.

bemoody commented 4 months ago

@tompollard @li-lcp

This is the issue I mentioned earlier today.

Notice that NewProjectVersionForm does not set order when creating References. That's a serious problem, and would have been apparent if the Reference class forced the order to be unique and non-null.

bemoody commented 4 months ago

Also notice that /projects/X/preview/ (and /content/X/) is sorted by order, while /projects/X/content/ is apparently sorted by id. (See BaseModelFormSet.get_queryset in django/forms/models.py)

lilehman commented 2 months ago

Issue: References shuffled and incorrectly displayed on the Project Page. (Not sure this is the same issue as the one reported above.)

More details:

The order of references look correct in the References section in "copy-edit" interface. However, References listed on Project Page seems to have been randomly shuffled for the following project.

A Brazilian Multilabel Ophthalmological Dataset (BRSET) https://www.physionet.org/projects/NFgVDYAzy5iv1Z5Sfvy6/preview/?Admin=True

Is there a way to fix this? Thanks

bemoody commented 2 months ago

I did start trying to fix this but I didn't see an easy way to address the existing broken projects.

There is no way for us to know automatically whether the list of references is correct - we have to manually review them.

Of course, if two different versions of the same project have exactly the same references in a different order, we can guess that one of them is wrong - but we don't know which one is wrong, or whether the author noticed it was wrong and tried to fix it. If the author did try to fix it, we don't know whether they did so based on the numbers shown in the "Content" page or the numbers shown in the "Preview" page.

So I'm sorry - we should do something ASAP to stop creating more damage every time a new version is created. But that won't help with all of the existing projects in the system that need to be fixed manually.

bemoody commented 2 months ago

Okay, I'm guessing I know what happened with chexchonet 1.0.0, mimic-iv-fhir 1.0, and physiotag 1.0.0.

It looks like edit_content_item does not change order of existing references when it deletes a reference.

I also didn't understand, until I saw Li-wei's example this morning, that ReferenceFormSet doesn't actually "save" all of the objects; presumably "save" \~cleverly\~ applies only to the objects that contain a modified form field.

So physiotag probably had refs 1, 2, 3, 4; then somebody deleted the third one (making it 1, 2, 4); then somebody added two more (making it 1, 2, 4, 4; then 1, 2, 4, 4, 5.)

bemoody commented 2 months ago

Current plan:

For what it's worth, I have an intense and abiding dislike of the hand-edited reference list, and my long-term plan is to completely eliminate it for new projects.

bemoody commented 1 month ago
bemoody commented 1 month ago
bemoody commented 1 month ago