NASA-IMPACT / nasa-apt

Code and issues relevant to the NASA APT project
Apache License 2.0
5 stars 0 forks source link

Multiple references should be enclosed by a single parentheses #784

Open wrynearson opened 1 year ago

wrynearson commented 1 year ago

From: https://github.com/NASA-IMPACT/nasa-apt/issues/781

When multiple references are cited at the same time, there should only be one parenthesis around all of the references. See the first example image below. The first red box is (Armston et al., 2013) ; (NI-Meister et al., 2010). It should look like this: (Armston et al., 2013; Ni-Meister et al., 2010).

Image

@kamicut, let's investigate this in sprint 2.

wrynearson commented 1 year ago

@bwbaker1, in the photo you provided in #781, it looks like there are in fact two different references because the semicolon is the same size as the paragraph text (larger than the references). Did the user add this semicolon manually?

bwbaker1 commented 1 year ago

@wrynearson

  1. The font size looks correct now. I tried force refreshing and adding a new document yesterday afternoon, but it wasn't updated on staging yet. Not sure why it took so long, but it looks good now.

  2. Regarding #781, the user did manually enter the semicolon. However, the parentheses are still incorrect even if there isn't a semicolon. Here is what it looks like:

Screen Shot 2023-08-04 at 7 49 00 AM

To technically be correct, it should look like this: This is a test (BradBaker, n.d.; Baker & Doe, 2021; Doe, 2019).

  1. I just realized that now the references are showing the authors full names. It should just be the last name. It's also not changing three or more authors to et al. anymore.

Screen Shot 2023-08-04 at 7 52 16 AM

wrynearson commented 1 year ago

Thanks for the update @bwbaker1.

We'll look into issues 2 and 3.

For issue 3 (the last name part), we're parsing authors by the using the entered format LastName1, FirstName1 and LastName2, FirstName2 and .... If the names are entered like this, the parsing works correctly, because we're searching for , and and to do the parsing. If users don't enter names like this, we can't differentiate which name is which because we just have one text field.

I just tested this on a document, and it works as expected (both in the view and PDF modes). Here's what the reference looks like:

image

And the result:

image

However, I see that we're not instructing users correctly (we say to enter first last and first last. We should update this.

bwbaker1 commented 1 year ago

@wrynearson okay I wasn't adding a comma between lastname and firstname. I don't remember that being necessary, but it's been awhile. Yeah, we just need to update the instructions with a comma. Thanks for checking on this!

wrynearson commented 1 year ago

No problem! Maybe I didn't communicate it properly. I have a PR up: https://github.com/NASA-IMPACT/nasa-apt-frontend/pull/535.

Let me know if this is good.

image

The problem with names are that there's an infinite number of formats, standards, etc. This will definitely not fit all name formats, but hopefully will meet the majority of use cases.

kamicut commented 1 year ago

@wrynearson I've found that citation-js can merge the citations of a single bibtex together. You can see the example at the bottom here: https://runkit.com/kamicut/citation-agu-example.

We would have to change how the reference manager keeps track of citations in the inserted text: the user would have to insert multiple references in one go instead of two separate references so that we know that they should be merged. We can modify the insert reference form to allow for multiple references.

wrynearson commented 1 year ago

Thanks @kamicut! How complex would you estimate the change to be?

bwbaker1 commented 1 year ago

@wrynearson As this is being scoped out, I'm concerned about two things.

  1. How does this affect existing ATBDs?
  2. I'm concerned about multiple references needing to be inserted in one go. Often, authors will come back and add new references as they are working on a document. It would be cumbersome to have to re-add all the references each time this happens.
wrynearson commented 1 year ago

@bwbaker1 are you referring to specifically the issue of multiple references within single parentheses, or the broader reference management discussion?

bwbaker1 commented 1 year ago

@wrynearson I'm referring to multiple references within single parentheses. For example, let's say someone has this:

(Doe et al. 2023; Duncan 2019; Jones 2010) and later on what's to add "Abbott 2011." I think they will get frustrated if they have to re-add all four references.

wrynearson commented 1 year ago

I don't think this would affect existing ATBD's but we'd ensure/test on staging before pushing any changes to production.

Do you have any insight into how often users are coming back and adding additional references into the same citations, or adding new citations?

Currently, we don't have a way to change citations that are already added. Users need to delete the citation, and then re-add a citation if they want to change the reference.

We could probably come up with a frontend solution to allow for users to add/remove multi-reference citations, but I'm worried that we're running up against our Sep 30 deadline for feature development. I would guess that users having to go back in and re-adding all references into a new multi-reference citation, then deleting the old citation wouldn't take more than 15 seconds, but I could be wrong.

kamicut commented 1 year ago

I'm investigating the internal representation of references within APT. Currently, references are imported into APT from BibTex (or from the form) into a non-standard format. Some fields are mapped from Bibtex into APT fields.

Here is the structure that maps fields from Bibtex to APT fields.

const propsToMap = [
      // from -> to
      ['address', 'publication_place'],
      ['author', 'authors'],
      ['doi', 'doi'],
      ['edition', 'edition'],
      ['isbn', 'isbn'],
      ['note', 'other_reference_details'],
      // Journal and series get mapped to the same APT property. If both exist
      // "series" prevails.
      ['journal', 'series'],
      ['journaltitle', 'series'],
      ['booktitle', 'series'],
      ['series', 'series'],

      ['report_number', 'report_number'],
      ['pages', 'pages'],
      ['publisher', 'publisher'],
      ['title', 'title'],
      ['url', 'online_resource'],
      ['volume', 'volume'],
      ['year', 'year'],
      ['date', 'year']
    ];

For reference, here are the standard Bibtex fields: https://www.bibtex.com/format/fields/

Ideally, we should not have a mapping and should use citation-js to manage the fields for us. However, I'm wondering if there were specific requirements for non-standard fields and a from<->to mapping from Bibtex before we use citation-js as the internal representation. cc @wrynearson @danielfdsilva

bwbaker1 commented 1 year ago

@wrynearson I discussed the issues related to in-text citations with Aaron.

  1. We both agree the best way (though not ideal) to deal with multiple citations is to have users manually add parentheses and semi-colons. This is extra work for them, but we think this is less frustrating than the alternatives.

Example of how APT would render multiple citations:

This is a test Tang et al. 2019 Armston et al. 2013 Ni-Meister 2010.

The user would add the parenthesis and semi colons. It would look like this:

This is a test (Tang et al. 2019; Armston et al. 2013; Ni-Meister 2010).

kamicut commented 5 months ago

@bwbaker1, I'd like to investigate this issue again. Do you know why our internal representation in APT is not Bibtex as mentioned in https://github.com/NASA-IMPACT/nasa-apt/issues/784#issuecomment-1695641227? If we don't need this intermediate representation, we can use Bibtex as the internal representation for references which could help solve this issue.

bwbaker1 commented 5 months ago

@kamicut I'm not sure why the internal representation is not bibtex. I'm fine with changing this if it solves the issue.

bwbaker1 commented 5 months ago

This is a priority FY24.3.

wrynearson commented 1 month ago

@bwbaker1 before we push this to production, I just want to reiterate that all references would lose their parentheses, and that users would have to add parentheses around all references (regardless of whether there are multiple references as described in this ticket, or not).

Image

Could you confirm that this is the desired behavior on production? cc @kamicut @sunu

bwbaker1 commented 1 month ago

@wrynearson Yes, this is the desired behavior. I've informed the most active science teams about this and they just prefer that things are formatted correctly. Thanks for double checking!