CDLUC3 / dmptool

DMPTool version of the DMPRoadmap codebase
https://dmptool.org
MIT License
58 stars 13 forks source link

Follow-up Tab Outputs Missing Citations on DMP ID Landing Page #631

Open mariapraetzellis opened 1 month ago

mariapraetzellis commented 1 month ago

Outputs listed on the Follow-Up tab include citations within the DMP Tool. However, the citations are not included on the DMP ID landing page.

For example, this plan includes citations on the follow-up tab, but on the DMP ID landing page, the “Other works associated with this research project” only lists the URL with no citation. This example DMP is from the FAIR Island project & will be used in a final funder report, so it would be good to update the citations in the next few weeks.

jupiter007 commented 1 month ago

I ran tests on both dev and stage, and was not able to reproduce the bug.

It appears that maybe changes were made at some point to fix this issue. We need to go back and add missing citations to the plans. We can probably write a script in dmsp_api_prototype that can go through all dmps and add missing citations.

jupiter007 commented 1 month ago

@briri and @bofstein I created a script called "add-missing-citations-to-dmps.rb", which loops through all unique dmps in the provided DynamoDB table, and finds the records that are missing citations in "dmproadmap_related_identifiers". The script displays a list of the missing citations, along with the associated DMP identifiers. For now, I commented out the part of the code that actually updates the DynamoDB table with the updated record until we validate that all the citations that are going to be added are correct.

The changes are on the dmsp_api_prototype branch "test/citation-test".

I had to update a couple of gems files that are used by lambda functions in order to get the list to work.

  1. For uc3-dmp-citation.rb, - I added a check for whether the response is actually a "bibtex" string, because I was getting errors when a PDF or HTML markup was returned instead.
  2. For uc3-dmp-external-api /client.rb- I actually stopped errors from being thrown when the response is a 404 error. Possibly not a good permanent solution, but we could temporarily change it in order to get the citations updated for now, and figure out the best solution for a permanent script later?

I've attached a copy of what the script output in my terminal. There were a total of 27 DMPs that contained missing citations. I cross checked a couple of the missing citations against the related works identifiers, and it appears to be correct, but I will plan on going through the whole list and confirming that the missing citations listed are the correct ones.

I ran the script against the production table to get the attached list of missing citations.

missing-citations-on-production.pdf

jupiter007 commented 1 month ago

Ok, I went through the list of DMPs that are missing citations, and I confirmed that the information looked correct.

However, I did run across some DMPs that had issues:

Loading the landing page for these didn't work. There were varying console errors displayed. I created this ticket, https://github.com/CDLUC3/dmsp_api_prototype/issues/3, to address the bugs.

https://doi.org/10.48321/D1H010 https://doi.org/10.48321/D1MS3M https://doi.org/10.48321/D1G01P https://doi.org/10.48321/D12A9C44eb https://doi.org/10.48321/D1C885 https://doi.org/10.48321/D1KS39 https://doi.org/10.48321/D13S3Z

These citations returned a 404 so they weren't added to the citations: (just wondering if this is something we want to check for before including it on the landing page if they just return 404s) https://demo.dataverse.org//dataset.xhtml?persistentId=doi:10.70122/FK2/ CMF6SA https://demo.dataverse.org//dataset.xhtml?persistentId=doi:10.70122/FK2/ MOTV7M https://demo.dataverse.org//dataset.xhtml?persistentId=doi:10.70122/FK2/ TUL62V https://demo.dataverse.org//dataset.xhtml?persistentId=doi:10.70122/FK2/ SSSZEH https://demo.dataverse.org//dataset.xhtml?persistentId=doi:10.70122/FK2/ M7QG3S https://demo.dataverse.org/dataset.xhtml?persistentId=doi:10.70122/FK2/ DDONSY https://demo.dataverse.org/dataset.xhtml?persistentId=doi:10.70122/ FK2/5HZXPQ

I'm assuming there are cases where some citations just aren't returned for various reasons. For example, these identifiers didn't return a citation. I'm guessing it's because they are not bibtex, but rather html or a different type:

Finally, this citation had duplicate related works listed. Not sure if we check for duplicates before displaying on the landing page: https://doi.org/10.48321/D1F88S

marisastrong commented 1 month ago

are any of these returning a 502 error? EZID has reported some resolving issues.

https://doi.org/10.48321/D1H010 https://doi.org/10.48321/D1MS3M https://doi.org/10.48321/D1G01P https://doi.org/10.48321/D12A9C44eb https://doi.org/10.48321/D1C885 https://doi.org/10.48321/D1KS39 https://doi.org/10.48321/D13S3Z

briri commented 1 month ago

@marisastrong no, they return a 200 status code. The page though is blank due some errors in the JS code that displays the content.

jupiter007 commented 1 month ago

Brian worked on updates to the missing citations script, and I tested in on stage and production.

I created this PR to run it against production: https://github.com/CDLUC3/dmsp_api_prototype/pull/10

I reviewed over the updates and they seem correct. Just waiting for approval to merge