Open elopatin-uc3 opened 4 years ago
The createmarc.py
script ran, which resulted in the following errors related to the two .unx files I uploaded to the server:
2020-10-01 13:31:31,493 ERROR: UNX file UC Santa Cruz MARC Q1 2020.unx not converted; missing ERROR 9798662497047
ERROR 9798662496941
ERROR 9798662481428
ERROR 9798662553729
ERROR 9798662481190
ERROR 9798662481695
ERROR 9798662481732
2020-10-01 13:31:35,071 ERROR: UNX file UC Irvine MARC Aug 2020.unx not converted; missing ERROR 9798662428157
ERROR 9798662428133
ERROR 9798662427846
ERROR 9798662427839
ERROR 9798662428775
ERROR 9798662428027
Related code in createmarc.py
485 # test if all ISBNs are available
486 test_str = xml_saxon_transform(namespace_xmlstr, constants.TEST_XSLT)
487 # convert using campus customizations using XSLT
488 if "ERROR" not in test_str:
489 if campuscode is not None:
490 campus_stylesheet = os.path.join(app_configs[hostenv]['xsl_dir'],
491 campus_configs[campuscode]['pqmarcxslt'])
492 campus_xml_str = xml_saxon_transform(namespace_xmlstr, campus_stylesheet)
493 outfilename = campuscode+time.strftime("%Y%m%d")+'PQ-orig.xml'
494 outfullpath = os.path.join(app_configs[hostenv]['marc_dir'],
495 outfilename)
496 campus_xml_file = codecs.open(outfullpath, 'wb')
497 campus_xml_file.write(campus_xml_str)
498 campus_xml_file.close()
499 else:
500 logging.error("ERROR: campus code not found %s", marcfilename)
501 else:
502 logging.error("ERROR: UNX file %s not converted; missing %s",
503 marcfilename, test_str)
Hi Eric, I still get notifications from this repo which I mostly ignore. I saw this one though, and thought I'd provide some context. This error is generated when an ETD for which we've gotten a MARC record isn't available in Proquest yet. This means you could try to click on the URL that's in the MARC record that they've provided, and you'd get an error message. This happens sometimes, although it looks like there are quite a few in this latest group. (The ID in the error message is an ISBN.) Generally these are cleared up quickly--there can be a short lag in their processing. I asked them once to explain their workflow so I could understand this better, but they didn't. If the error persists, you can contact them and they'll pull a chain somewhere and the missing ones show up. Hope this is helpful. Let me know if you have questions about any of this--I might be able to explain (if I can still remember). Best wishes, Perry
@cpwillett Hi Perry, It's good to hear from you. I was referring to the ETD operations doc you'd put together, and along with it, this note provides more context – thanks! I hadn't realized the string in each error message is an ISBN (but should have). I've been in touch with several folks at ProQuest over the past half year or so, so may reach out to one or two for an explanation about the delay you mention. Perhaps they'll have more to share (or not). Either way, I'll re-run these two UNX files next week.
Since we're here, you may be able to confirm another bit of information. The campus.yml
settings for UCSC show:
create_marc: False
delivery_marc: True
I assume the create_marc
is set to False
because we receive records from PQ. But confirmation on this would be great – especially since we receive records from PQ for Merced as well, and settings for Merced differ:
create_marc: True
delivery_marc: True
Thanks again. Hope you're doing well. David and Mark say "hello." Best, Eric
ps. Daniella, Maria and Marisa say hi too (John's out today). Maria notes, "all the non-UC EZID DOI users are finally transferred to DataCite!" And from Mark: Go Cubs! Oops, I mean Go Cardinals And... Brian, Scott and John ("Piscotty was a great trade!") say hello too.
Hi Eric,
There are two different kinds of MARC records. The first kind are "created" from scratch, after it's first published in eScholarship. The second kind is an XSLT transformation of a MARC record "delivered" by Proquest several weeks after the ETD is received. The terminology is a little confusing--I had to work a little to remember the difference myself. Sorry about that.
I also remembered something else about my previous message--here's how this check works. The ISBN is extracted from the PQ MARC record. It checks whether that ISBN is in the database (or the XML serialization of it). If not, it generates the error message. It could be that the ETD isn't in the Proquest database (puzzling and annoying), or it could be that it's there, but hasn't been matched by the pqgateway.py script and needs some manual intervention for some reason.
Give my best to everyone. It's autumn in south-central Michigan. Tell Mark I'm watching the Cubs game!
Perry
Latest update – the .unx files being delivered by ProQuest seem to be at least part of the problem. PQ sent me .mrc files and I am comparing the two. At this point, two abbreviated .mrc files have processed without the errors for the aforementioned ETDs.
Summary
Sarah Lindsey, the head of metadata services at the UCSC Library contacted us about .mrc MARC record files that were once regularly delivered to them. These have not come through for some time now. They apparently went through a system migration on their side, and she now has the time to work with the records we (used to) send. https://cdl.freshdesk.com/a/tickets/79034
It's important to note that in the case of UCSC, we receive records in the form of .unx files directly from ProQuest on a roughly quarterly basis (sometimes more frequently). These are manually uploaded to the ETDs server and processed when the
createmarc.py
script runs. I've executed these uploads regularly and have noted that the PQ .unx files are deleted after the script runs. If we need to re-process these, I can probably dig up a series of them as email attachments from PQ.Tasks
campus.yml
config and update create_marc line item (it was set to false; set it to true)campus.yml
for UCSC