Center-for-Research-Libraries / vufind

CRL Implimentation of VuFind frontend for FOLIO. A library resource discovery portal designed and developed for libraries by libraries
GNU General Public License v2.0
1 stars 0 forks source link

Finalize shell script local logic for import and incremental scripts #22

Closed ryan-jacobs closed 2 years ago

ryan-jacobs commented 2 years ago

The import and incremental scripts seem to have been sourced from EBSCO. These are critital scripts for harvesting, and it would be nice to make these functional across multiple environments (dev/prod). We need to get some additional background on these scripts though:

ryan-jacobs commented 2 years ago

Possibly related to #20

mmabrahamson commented 2 years ago

A few quick answers to the above questions:

The scripts were created by another Library Service Engineer at EBSCO to help with automating the import processes. Feel free to edit these as needed to fit your needs!

They do run as root. I'm not a security expert, so I'm not sure what concerns there may be, but if you do run into any I'm happy to help figure out workarounds. The script could probably be split out into multiple scripts that were run as specific user types if needed to handle the different user needs.

For the alphabetical browse logic, I'd suggest checking out this page on VuFind's site about how they handle alphabetical browse: https://vufind.org/wiki/indexing:alphabetical_heading_browse. The short answer here is that the import scripts are running what's necessary in VuFind to create this index, and disabling it in VuFind should hide the option for users. If you don't intend to use alphabetical browse, I'd suggest commenting it out of the import and incremental scripts.

ryan-jacobs commented 2 years ago

I'm adjusting the milestone. The current scripts effectively support launch, they are just not optimized for ongoing automation. This can be addressed soon after launch.

ryan-jacobs commented 2 years ago

It should be possible to run these scripts as a non-root user as long as we are careful about file and process permissions. Notably:

  1. The script needs to restart solr, so it must run as a user with appropriate perms
  2. The script needs to be aware of some vufind environment vars, so the running user needs these in their shell, or they need to be set globally.
  3. The script needs to clear caches, so it needs appropriate file-level perms there

Our www-admin user is already the user starting solr, so that addresses num 1. In production we also setup env vars inside /etc/profile.d/vufind.sh, so that covers num 2. See: https://workspace.crl.edu/display/SYS/VuFind

In order to address num 3 we need to ensure cache files are written with group write perms, as our www-admin user is in the www-data group. While it's possible to simply setup Apache with a specific umask value (like 002) it does not look like that will work here as VuFind appears to explicitly declare perms when writing out the cache files. So we need to alter config details to do this instead (in instance-specific config.ini):

[Cache]
; Ensure that production cache files have group write permissions. This allows
; them to be cleared and managed by any user with www-data group perms, such 
; as our www-admin user.
umask = 002
dir_permission = 0770
file_permission = 0660
ryan-jacobs commented 2 years ago

We should also add a check to the scripts to validate which user is active so that folks don't accidentally run the script as a personal user, or as root. However, we do want to give the option to run as another user in dev environments when permissions are much more liberal (e.g MAMP). Only if a non www-admin user is detected does anything special need to happen, such as a prompt. In fact, we don't want anything to happen when the user is www-admin so as to be compatible with non-interactive cron jobs.

ryan-jacobs commented 2 years ago

This should be ready to go based on the last couple issue commits. Flagging for QA so that we can test in production after deploy before closing (bash handling may differ is subtly ways across environments).

Let's also confirm incremental.sh is running as planned via cron before closing.

ryan-jacobs commented 2 years ago

Tested on prod. It all looks good.