MIT-LCP / physionet-build

The new PhysioNet platform.
https://physionet.org/
BSD 3-Clause "New" or "Revised" License
56 stars 20 forks source link

References in unspecified order #590

Closed bemoody closed 4 years ago

bemoody commented 5 years ago

"References" in project pages are displayed as a numbered list, but the order in which they are shown is unspecified, AFAICT.

(In project_preview and published_project, they are retrieved by 'project.references.all()'.)

Even if you assume that the order that references are created is preserved, there's no reasonable way for authors to change that order.

References should be listed in alphabetical order anyway (though in some cases it may be hard to do this correctly and automatically.)

bemoody commented 5 years ago

Relatedly, each reference should also have an id attribute by which it can be referred to in the text.

It could just be "ref-{%forloop.counter%}", but that relies on there actually being a stable sorting key!

But as an author, I'd actually prefer that the id be something like "ref-goldberger-2003".

tompollard commented 4 years ago

I came across this (or a related) issue when moving a URL from a project description to the reference section during the copyediting stage.

Brief summary:

  1. visited the copyedit page: https://physionet.org/console/submitted-projects/SLUG/copyedit/
  2. clicked the "+" button next to the references to add a reference, as shown in the screenshot below:

Screen Shot 2020-04-25 at 14 38 17

  1. added a new reference at the end of the sequence (ref 9 in this case).

Screen Shot 2020-04-25 at 14 42 19

  1. in the project preview (https://physionet.org/projects/SLUG/preview/?Admin=True), the new reference appeared as reference 1.

Screen Shot 2020-04-25 at 14 42 52

tompollard commented 4 years ago

References should be listed in alphabetical order anyway (though in some cases it may be hard to do this correctly and automatically.)

I'm not sure I agree that alphabetical is best, but we definitely need to come up with a plan. Personally I like numerical citations (e.g. Vancouver Style):

Vancouver is a numbered referencing style commonly used in medicine and science, and consists of:

  • Citations to someone else's work in the text, indicated by the use of a number.
  • A sequentially numbered reference list at the end of the document providing full details of the corresponding in-text reference.
tompollard commented 4 years ago

@Lucas-Mc please could you take a look at the ordering issue mentioned in this thread? It is quite problematic when trying to add a new reference to a project (e.g. during copyediting stage).

Specifically, it would be good to fix the following bug:

Screen Shot 2020-05-27 at 11 28 55

I don't think we need to agree on the in-text citation format right now (and this seems more like an editorial decision than anything - i.e. what is our preferred style).

Lucas-Mc commented 4 years ago

Upon further investigation, I found out that each time the page is refreshed a new order is used, as you were saying, irrelevant to the order stated in the reference addition / editing section.

Every time this command is run it generates a new order: project.references.all()

I fixed it for project previews using .order_by('id') though it is not working the same for published projects and the ID numbers are not what I expected (i.e. 9, 10, 11, 12). I think new ID numbers are assigned each round and once it is published the order changes again for some reason.

bemoody commented 4 years ago

Upon further investigation, I found out that each time the page is refreshed a new order is used, as you were saying, irrelevant to the order stated in the reference addition / editing section.

Yeah, that's what you should see in the development server. In production server you should see that the order is fixed (I hope) but not necessarily predictable.

One way to fix this would be to add a numeric position field to each reference, and update those position fields when an item is inserted/removed. The logic for updating these fields is not trivial, but is implemented already for authors; if you go this route, it would be best to turn the logic into an abstract class that could be inherited by different models.

Another option would be to order references automatically, as most (?) bibliographic styles recommend. Sort based on a string key, which would by default be generated from the first author's family name and the year of publication. This could also be used to form a unique "id" for each reference (#ref-johnson-2016) that could be hyperlinked within the text (which it really should be!)

tompollard commented 4 years ago

Another option would be to order references automatically, as most (?) bibliographic styles recommend.

In terms of the way that the references are rendered, I think the modern approach is to use a numbering system, rather than inline author names and an alphabetical reference list (which feels old-school to me).

e.g.: see the reference section in the following links for example:

tompollard commented 4 years ago

This could also be used to form a unique "id" for each reference (#ref-johnson-2016) that could be hyperlinked within the text (which it really should be!)

It would be great if we could introduce a system for allowing the citations to be easily added in the main text like this.

It could be that each reference requires a reference ID (or we generate one from the content) and then the author uses the ID to add the citation in the text as you suggest.

bemoody commented 4 years ago

In terms of the way that the references are rendered, I think the modern approach is to use a numbering system, rather than inline author names and an alphabetical reference list (which feels old-school to me).

Not going to argue what is more or less modern. There is certainly some appeal in either strategy.

Note that your second example does indeed have numbers shown in the references section, but they're meaningless (the citations are "(Authors, Year)" and the references are in alphabetical order.)

I would, however, find it impossible to keep track of the order of references by hand - if you want to use "order of first citation", then that order should be determined automatically (or at least checked automatically) from the document text.

Perhaps we could say that citations are stored as a particular HTML element:

We obtained heart rate and blood pressure measurements from the MIMIC-III Clinical Database [999]

and, upon saving the page, we scan for these elements, generate an ordered list of unique reference objects, and replace the "999" in each citation with the correct number.

(Obviously we would provide a ckeditor button so people don't have to type that stuff by hand.)

Lucas-Mc commented 4 years ago

Is this all too complicated? Each reference already has an ID associated with it which is what I used to sort them at the end in #1061. This ID is generated by the order that the author submits these references in their content submission form. If we wanted to rearrange them at the end for some reason why not just do it using those IDs instead of creating our own? Anyway, it should be by the order of appearance (according to Vancouver style) and if the submitting author doesn't do that we should fix it in the copyedit. (I think Vancouver style is easiest to read and easiest to implement but that's just my two cents.)

tompollard commented 4 years ago

Note that your second example does indeed have numbers shown in the references section, but they're meaningless (the citations are "(Authors, Year)" and the references are in alphabetical order.)

True, that was a bad example!

I would, however, find it impossible to keep track of the order of references by hand - if you want to use "order of first citation", then that order should be determined automatically (or at least checked automatically) from the document text.

Agreed, last week I had to insert a reference into a paper just before publication, and resorting the reference list was horrible. If we can support a Latex-style system using IDs linked to a reference, then this seems best.

Perhaps we could say that citations are stored as a particular HTML element ... and, upon saving the page, we scan for these elements, generate an ordered list of unique reference objects, and replace the "999" in each citation with the correct number.

This seems like a good approach. It also separates the ultimate rendering of references, from the process of adding them to a document.

tompollard commented 4 years ago

Is this all too complicated? Each reference already has an ID associated with it which is what I used to sort them at the end in #1061.

I haven't looked at the PR yet, but the solution proposed here seems relatively straightforward:

If we wanted to rearrange them at the end for some reason why not just do it using those IDs instead of creating our own?

Depending on choice of style, the reference list would be sorted either (1) numerically in the order in which they appear in the main text or (2) alphabetically by first author (more difficult to do if we don't have structured references).

Anyway, it should be by the order of appearance (according to Vancouver style) and if the submitting author doesn't do that we should fix it in the copyedit.

Why leave manually fixing the order references to a copyedit stage? With a clear system, we can ask authors to do the work (e.g. "Please cite the references inline using the IDs").

bemoody commented 4 years ago

Is this all too complicated?

Oh, absolutely! :P

Each reference already has an ID associated with it which is what I used to sort them at the end in #1061.

They have a primary key, and yes, those are currently integers and assigned in monotonic order, but it's not a great idea to rely on that.

Moreover, it needs to be possible to put the references into the desired order (whatever the "desired order" is), and deleting references & re-creating them is not a good way to have to go about that.

This ID is generated by the order that the author submits these references in their content submission form. If we wanted to rearrange them at the end for some reason why not just do it using those IDs instead of creating our own?

Changing primary keys is not a good idea even if the DB permits it, which I don't know if it does. You could say that we'll keep the primary keys as-is and move all of the other fields from one object to another, but that's fraught with problems for other reasons.

Anyway, it should be by the order or appearance and if the submitting author doesn't do that we should fix it in the copyedit.

If the submitting author accidentally swaps [2] and [3], would you even notice? And how could it be less than a nightmare to try to fix it? This is a mechanical task, not one that should be done by a human.

bemoody commented 4 years ago

I haven't looked at the PR yet, but the solution proposed here seems relatively straightforward

I wouldn't go that far :) but I don't think it should be too hard to implement.