VTUL / vtechworks

DSpace at Virginia Tech
http://vtechworks.lib.vt.edu
Other
6 stars 8 forks source link

OAI-PMH harvest fails with 500 HTTP error #725

Closed alawvt closed 4 years ago

alawvt commented 4 years ago

EBSCO reported an error with their OAI-PHM harvest of VTechWorks.

ERROR: Could not harvest from https://vtechworks.lib.vt.edu/oai/request: Error while harvesting using setSpec 'com_10919_5553'. Will not continue. System.Net.WebException: The remote server returned an error: (500) Internal Server Error.

2020-09-17 10:07:55,441 INFO : https://vtechworks.lib.vt.edu/oai/request?verb=ListRecords&metadataPrefix=qdc&set=com_10919_5553
2020-09-17 10:07:55,829 INFO : https://vtechworks.lib.vt.edu/oai/request?verb=ListRecords&resumptionToken=qdc%2f%2f%2fcom_10919_5553%2f100
2020-09-17 10:07:56,145 INFO : https://vtechworks.lib.vt.edu/oai/request?verb=ListRecords&resumptionToken=qdc%2f%2f%2fcom_10919_5553%2f200
2020-09-17 10:07:56,443 INFO : https://vtechworks.lib.vt.edu/oai/request?verb=ListRecords&resumptionToken=qdc%2f%2f%2fcom_10919_5553%2f300
2020-09-17 10:07:56,798 INFO : https://vtechworks.lib.vt.edu/oai/request?verb=ListRecords&resumptionToken=qdc%2f%2f%2fcom_10919_5553%2f400
2020-09-17 10:07:57,058 INFO : https://vtechworks.lib.vt.edu/oai/request?verb=ListRecords&resumptionToken=qdc%2f%2f%2fcom_10919_5553%2f500
2020-09-17 10:07:57,352 INFO : https://vtechworks.lib.vt.edu/oai/request?verb=ListRecords&resumptionToken=qdc%2f%2f%2fcom_10919_5553%2f600
2020-09-17 10:07:57,877 INFO : https://vtechworks.lib.vt.edu/oai/request?verb=ListRecords&resumptionToken=qdc%2f%2f%2fcom_10919_5553%2f700
2020-09-17 10:07:58,210 INFO : https://vtechworks.lib.vt.edu/oai/request?verb=ListRecords&resumptionToken=qdc%2f%2f%2fcom_10919_5553%2f800
2020-09-17 10:07:58,519 INFO : https://vtechworks.lib.vt.edu/oai/request?verb=ListRecords&resumptionToken=qdc%2f%2f%2fcom_10919_5553%2f900
2020-09-17 10:07:58,852 INFO : https://vtechworks.lib.vt.edu/oai/request?verb=ListRecords&resumptionToken=qdc%2f%2f%2fcom_10919_5553%2f1000
2020-09-17 10:07:59,151 INFO : https://vtechworks.lib.vt.edu/oai/request?verb=ListRecords&resumptionToken=qdc%2f%2f%2fcom_10919_5553%2f1100
2020-09-17 10:07:59,452 INFO : https://vtechworks.lib.vt.edu/oai/request?verb=ListRecords&resumptionToken=qdc%2f%2f%2fcom_10919_5553%2f1200
2020-09-17 10:07:59,869 INFO : https://vtechworks.lib.vt.edu/oai/request?verb=ListRecords&resumptionToken=qdc%2f%2f%2fcom_10919_5553%2f1300
2020-09-17 10:08:00,120 INFO : https://vtechworks.lib.vt.edu/oai/request?verb=ListRecords&resumptionToken=qdc%2f%2f%2fcom_10919_5553%2f1400
2020-09-17 10:08:00,365 INFO : https://vtechworks.lib.vt.edu/oai/request?verb=ListRecords&resumptionToken=qdc%2f%2f%2fcom_10919_5553%2f1500
2020-09-17 10:08:21,792 ERROR: Error saving an xml document: The remote server returned an error: (500) Internal Server Error.

This was a harvest of the College of Science community. The request, https://vtechworks.lib.vt.edu/oai/request?verb=ListRecords&resumptionToken=qdc///com_10919_5553/1500 should have listed records #1501-1600 but failed. The request, https://vtechworks.lib.vt.edu/oai/request?verb=ListRecords&resumptionToken=qdc///com_10919_5553/1600, listing records #1601-1700, succeeded, indicating that there was an (probably XML) error in a record between records 1501 and 1600 in this community.

I used similar requests for subcommunities and collections within the College of Science to narrow the problem to item #130 in the Scholarly Works, Department of Mathematics collection. Then I downloaded the metadata from this collection. Fortunately, the default order of items in the metadata matched the order of the items in the OAI-PMH records listing. With that, I identified the item, http://hdl.handle.net/10919/78922. An OAI-PHM request for this record, https://vtechworks.lib.vt.edu/oai/request?verb=GetRecord&metadataPrefix=qdc&identifier=oai:vtechworks.lib.vt.edu:10919/78922, failed, confirming the item.

The abstract for this item has many special characters and was a SWORD deposit. The original metadata in the SWORD XML file is in MathML, although the abstract was copied and pasted from the PDF. I will attempt to fix this abstract.

I will also create a new issue to enable MathJAX in DSpace.

alawvt commented 4 years ago

I experimented with the abstract and eventually determined that the encoding for it was ok and the problem was with the field, itself. For instance, I could copy the abstract to a new field. However, I could not delete the old field. So, I exported the metadata for the items, edited the csv, deleting the field that way. After that, the item still would not show for OAI-PMH. So, finally, I duplicated the item as http://hdl.handle.net/10919/99989 and deleted the problem item, http://hdl.handle.net/10919/78922. The new item is now available in OAI at https://vtechworks.lib.vt.edu/oai/request?verb=GetRecord&metadataPrefix=qdc&identifier=oai:vtechworks.lib.vt.edu:10919/99989.

It remains undetermined what actually caused this problem.

I would like to check all the items in VTechWorks for OAI harvesting, since there might be others. Continued in #728.