CDLUC3 / ezid

CDLUC3 ezid
MIT License
11 stars 4 forks source link

Restore shoulder-list.txt #523

Closed jsjiang closed 6 months ago

jsjiang commented 7 months ago

The arks.org site retrieves the shoulder-list.txt, a text file containing a list of all the EZID shoulders (DOI or ARK) to ensure its configuration is up to date

The file was generated by the manage.py shoulder-list operation and was placed in ezid/static/info/shoulder-list.txt which could be retrieved from the arks.org service from the URL https://ezid.cdlib.org/static/info/shoulder-list.txt.

Something broke with a recent update to EZID, with the manage.py shoulder-list operation no longer working and the shoulder-list.txt file no longer being available.

Temporary fix: Dave placed a copy of the shoulder-list.txt file on ezid-prd instance

Short term solution: I will test the script and ask Ashley to setup a cron job to update the shoulder-list.txt once a week.

Long term solution:

jsjiang commented 7 months ago

From Dave:

The long term solution is to set the NAAN registry entries with the correct information. This would mean no changes to EZID.

That said, it would still be beneficial for EZID to implement an API for listing shoulders - actually it would be listing NAANs. This should be done as part of the resolve inflection operation much as the existing functionality that returns shoulders given a NAAN. For example:

https://ezid.cdlib.org/ark:/99999?info

lists the shoulders for the NAAN 99999.

A request like:

https://ezid.cdlib.org/ark:?info

would return a list of ARK NAANs.

The shoulder-list.txt file is not a desirable solution and was only put in place pending updates to the NAAN registry for EZID managed NAANs.

Dave

jsjiang commented 7 months ago

Currently the manage shoulder-list operation only saves output to the log file (with the --debug option). It does not save output to the shoulder-list.txt file.

Also the log entry format is a little different than the temporary file https://ezid.cdlib.org/static/info/shoulder-list.txt

Temporary file data format:

impl.nog.shoulder INFO     shoulder - Shoulders:
impl.nog.shoulder INFO     shoulder - ark:/13030/bn        ?1?
impl.nog.shoulder INFO     shoulder - doi:10.5070/L6       Aleph, UCLA Undergraduate Research Journal for the Humanities and Social Sciences
impl.nog.shoulder INFO     shoulder - doi:10.5070/LN4      Alon: Journal for Filipinx American and Diasporic Studies
impl.nog.shoulder INFO     shoulder - doi:10.17953/A3      American Indian Culture and Research Journal
impl.nog.shoulder INFO     shoulder - doi:10.17953/        American Indian Culture and Research Journal (no minter)
impl.nog.shoulder INFO     shoulder - ark:/86073/b3        American University of Beirut
impl.nog.shoulder INFO     shoulder - ark:/99999/fk4       ARK Test
impl.nog.shoulder INFO     shoulder - ark:/99999/fk8       ARK Test (non-expiring)
impl.nog.shoulder INFO     shoulder - doi:10.5070/RJ4      Asian American Research Journal
impl.nog.shoulder INFO     shoulder - doi:10.5070/P3       Asian Pacific American Law Journal
impl.nog.shoulder INFO     shoulder - doi:10.5070/BK8      Backbone
impl.nog.shoulder INFO     shoulder - ark:/85779/j4        Berkeley Law Library
impl.nog.shoulder INFO     shoulder - doi:10.15779/J2      Berkeley Law Library
impl.nog.shoulder INFO     shoulder - doi:10.15779/Z38     Berkeley Law School Journals

Log entry from manage shoulder-list:

shoulder.py:81 shoulder impl.nog_sql.shoulder INFO     shoulder - Shoulders:
shoulder.py:83 shoulder impl.nog_sql.shoulder INFO     shoulder - ark:/13030/bn        ?1?
shoulder.py:83 shoulder impl.nog_sql.shoulder INFO     shoulder - doi:10.5070/L6       Aleph, UCLA Undergraduate Research Journal for the Humanities and Social Sciences
shoulder.py:83 shoulder impl.nog_sql.shoulder INFO     shoulder - doi:10.5070/LN4      Alon: Journal for Filipinx American and Diasporic Studies
shoulder.py:83 shoulder impl.nog_sql.shoulder INFO     shoulder - ark:/86073/b3        American University of Beirut
shoulder.py:83 shoulder impl.nog_sql.shoulder INFO     shoulder - ark:/99999/fk4       ARK Test
shoulder.py:83 shoulder impl.nog_sql.shoulder INFO     shoulder - ark:/10945/          ARK Test (no minter)
shoulder.py:83 shoulder impl.nog_sql.shoulder INFO     shoulder - ark:/99999/fk8       ARK Test (non-expiring)
shoulder.py:83 shoulder impl.nog_sql.shoulder INFO     shoulder - doi:10.5070/RJ4      Asian American Research Journal
shoulder.py:83 shoulder impl.nog_sql.shoulder INFO     shoulder - doi:10.5070/P3       Asian Pacific American Law Journal
shoulder.py:83 shoulder impl.nog_sql.shoulder INFO     shoulder - doi:10.5070/BK8      Backbone
shoulder.py:83 shoulder impl.nog_sql.shoulder INFO     shoulder - ark:/85779/j4        Berkeley Law Library
shoulder.py:83 shoulder impl.nog_sql.shoulder INFO     shoulder - doi:10.15779/J2      Berkeley Law Library
shoulder.py:83 shoulder impl.nog_sql.shoulder INFO     shoulder - doi:10.15779/Z38     Berkeley Law School Journals
jsjiang commented 7 months ago

@datadavev Hi Dave, do you require the shoulder-list.txt file in the above log format? Will a tsv file with shoulder and description work?

Jing

datadavev commented 7 months ago

All that is needed is a list of NAANs managed by EZID. A natural pattern to follow is to provide the list of prefixes (NAANs) given a scheme (ark:). If a client wanted more details on a particular NAAN then they could use an inflection request on the NAAN.

So basically:

https://ezid.cdlib.org/ark:?info -> list of NAAN https://ezid.cdlib.org/ark:/NAAN?info -> list of shoulders for NAAN

For the list of NAANs, it can be:

{
    "prefixes": [
        "86073",
        "10945",
        ...
    ]
}
jsjiang commented 7 months ago

Cron job entry

42 14 * * 1-7 /ezid/bin/run-shoulder-list.sh

Wrapper script:

#!/bin/bash
#
# wrapper script providing shell environment to run ezid shoulder-list command 
# from crontab.
#
# This file is managed by puppet

export PYTHONPATH=$HOME/ezid
export DJANGO_SETTINGS_MODULE=settings.settings
PYENV_ROOT=$HOME/.pyenv
$PYENV_ROOT/shims/django-admin shoulder-list --debug > /apps/ezid/ezid/static/info/shoulder-list.txt
jsjiang commented 7 months ago

@ashleygould Hi Ashley, Can you add the run-shoulder-list.sh to the cron on ezid-prd. The script is on ezid-dev in the bin directory: ezid@uc3-ezidui01x2-dev:/apps/ezid/bin/run-shoulder-list.sh. Please set it up to run everyday at 3:15am.

Let me know if you have questions.

Thank you

Jing

Note: Puppet removes this file when deploying new EZID release. Need to run the run-shoulder-list.sh script after new code has been deployed.

ashleygould commented 7 months ago

this is complete. please validate the job runs as expected.

ezid@uc3-ezidui02x2-prd:15:40:10:~$ ll bin/run-shoulder-list.sh 
-rwxr-xr-x 1 ezid ezid 491 Dec 15 15:39 bin/run-shoulder-list.sh
ezid@uc3-ezidui02x2-prd:15:40:16:~$ crontab -l
# HEADER: This file was autogenerated at 2023-12-15 15:39:50 -0800 by puppet.
# HEADER: While it can still be managed manually, it is definitely not recommended.
# HEADER: Note particularly that the comments starting with 'Puppet Name' should
# HEADER: not be deleted, as doing so could cause duplicate cron jobs.
# Puppet Name: ezid-rotate_and_compress_logs
59 23 * * 1-7 /bin/date >> /ezid/var/log/uc3_logrotate.log ; /usr/sbin/logrotate --state /ezid/etc/logrotate.status /ezid/etc/logrotate.conf >> /ezid/var/log/uc3_logrotate.log
# Puppet Name: clearsessions
0 0 * * 0 /ezid/bin/clearsessions.sh
# Puppet Name: link_check_emailer
15 3 * * * /ezid/bin/run-shoulder-list.sh
# Puppet Name: link_check_summary_report
0 3 10 * * /ezid/bin/link_check_summary_wrapper.sh
jsjiang commented 6 months ago

Issue and solution: currently theshoulder-list.txt file is generated from a cron job every day at 3:15am. However, it will be deleted when we run the EZID deployment script. Ashley and I looked into a few solutions. Since theshoulder-list.txt file is a temporary solution, we think the easiest solution for this is to run the /ezid/bin/run-shoulder-list.sh command manually each time after EZID code deployment.

jsjiang commented 6 months ago

created new ticket for creating ZEID APIs to list NAAN and shoulders #549