CDLUC3 / mrt-doc

Documentation and Information regarding the Merritt repository
8 stars 4 forks source link

ETDs database cleanup per ingest issues on 6/24 and 9/3 #411

Closed elopatin-uc3 closed 3 years ago

elopatin-uc3 commented 4 years ago

Several ETD ingests failed during a morning outage on 6/24. The outage was caused by a Storage scratch disk issue. Based on the ETDs log on uc3-etdx2-prd, associated database entries do not exist. Note that the failed submissions were manually submitted a day later, but updates to the db were not completed.

elopatin-uc3 commented 4 years ago

2020-07-20 13:31:16,765 ERROR: /apps/etds/apps/uc3-etds/zipfiles/20200624/etdadmin_upload_718715.zip not found in 
ETD db
2020-07-20 13:31:16,781 ERROR: /apps/etds/apps/uc3-etds/zipfiles/20200624/etdadmin_upload_744797.zip not found in 
ETD db
2020-07-20 13:31:16,796 ERROR: /apps/etds/apps/uc3-etds/zipfiles/20200624/11801.zip not found in ETD db
2020-07-20 13:31:16,812 ERROR: /apps/etds/apps/uc3-etds/zipfiles/20200624/18985.zip not found in ETD db
2020-07-20 13:31:16,827 ERROR: /apps/etds/apps/uc3-etds/zipfiles/20200624/etdadmin_upload_743146.zip not found in 
ETD db
2020-07-20 13:31:16,841 ERROR: /apps/etds/apps/uc3-etds/zipfiles/20200624/19028.zip not found in ETD db
2020-07-20 13:31:16,854 ERROR: /apps/etds/apps/uc3-etds/zipfiles/20200624/19029.zip not found in ETD db
2020-07-20 13:31:16,868 ERROR: /apps/etds/apps/uc3-etds/zipfiles/20200624/19021.zip not found in ETD db
2020-07-20 13:31:16,881 ERROR: /apps/etds/apps/uc3-etds/zipfiles/20200624/19023.zip not found in ETD db
2020-07-20 13:31:16,894 ERROR: /apps/etds/apps/uc3-etds/zipfiles/20200624/19012.zip not found in ETD db
2020-07-20 13:31:16,908 ERROR: /apps/etds/apps/uc3-etds/zipfiles/20200624/etdadmin_upload_735764.zip not found in 
ETD db
2020-07-20 13:31:16,921 ERROR: /apps/etds/apps/uc3-etds/zipfiles/20200624/etdadmin_upload_739490.zip not found in 
ETD db
2020-07-20 13:31:16,934 ERROR: /apps/etds/apps/uc3-etds/zipfiles/20200624/etdadmin_upload_741576.zip not found in 
ETD db
elopatin-uc3 commented 3 years ago

I've discovered that four of the eight ETDs that are missing MARC records (UCSD) were those which had to be manually ingested after we had ingest and storage issues on June 24th:

Carrillo, Angelina (ID: 19377) Cole, Devlin (ID: 19463) Courchesne, Natasia (ID: 19405) Lo, Shelton (ID: 19183)

I've downloaded a copy of the ETDs database to investigate further.

elopatin-uc3 commented 3 years ago

Harvesting UC Irvine ETDS.
...Server returned HTML instead of metadata
Item error: No metadata file found in feed data for item 'qt7vw104b1' (ext ark:/13030/m5j67n38)
Traceback (most recent call last):
  File "erep/xtf/control/tasks/merrittHarvest.py", line 324, in harvestCollection
    self.process(host, entry, collectionName)
  File "erep/xtf/control/tasks/merrittHarvest.py", line 229, in process
    raise ItemError("No metadata file found in feed data for item '%s' (ext %s)" % (ark, origId))
harvestBase.ItemError: No metadata file found in feed data for item 'qt7vw104b1' (ext ark:/13030/m5j67n38)

Harvesting UC Santa Barbara ETDS.
...Server returned HTML instead of metadata
Item error: No metadata file found in feed data for item 'qt8s87c3v5' (ext ark:/13030/m5cr5v4d)
Traceback (most recent call last):
  File "erep/xtf/control/tasks/merrittHarvest.py", line 324, in harvestCollection
    self.process(host, entry, collectionName)
  File "erep/xtf/control/tasks/merrittHarvest.py", line 229, in process
    raise ItemError("No metadata file found in feed data for item '%s' (ext %s)" % (ark, origId))
harvestBase.ItemError: No metadata file found in feed data for item 'qt8s87c3v5' (ext ark:/13030/m5cr5v4d)
elopatin-uc3 commented 3 years ago

New metadata issues showing up in eSchol harvests – These have to do with duplicative entries in our atom feeds. The teams are treating this as an issues which is partially addressed by repeated runs of the eSchol harvester. However the Merritt team needs to fix the atom feed bug: #478

elopatin-uc3 commented 3 years ago

After examining the MARC record workflow on the ETDs server (createmarc.py plus the etd.db and notes here), it's been determined that the lack of entries in the merritt_ingest table related to the four remaining, missing records was the culprit. Creating an entry for one related ETD and then running createmarc.py resulted in generation and delivery of its record to OCLC, and delivery of the .csv to UCSD. I will do the same for the remaining three today or tomorrow.

elopatin-uc3 commented 3 years ago

The four ETDs that were missing records have now all been processed: Carrillo, Angelina (ID: 19377) Cole, Devlin (ID: 19463) Courchesne, Natasia (ID: 19405) Lo, Shelton (ID: 19183)