StateLibraryVictoria / metadata-clean-up

A Code Club project to create some tools to help clean-up metadata in the Library's catalogue systems
1 stars 0 forks source link

Parse MARC XML records #3

Closed susannah-slv closed 8 months ago

susannah-slv commented 9 months ago

Overview

Create a script which parses the data returned from the API from a file into comprehensible MARC that we can do something with. This will require transforming the JSON into Python data, selecting the record part of the returned data, and parsing XML.

JSON

Leads:

XML

Potential leads:

susannah-slv commented 8 months ago

Took a while to figure out that not passing encoding to open for files being passed to pymarc caused the underlying xml handling to try to parse with some other encoding standard. Specifying encoding="utf-8" when loading, with errors="backslashreplace" seems to be doing the trick. Another option, which I considered was creating a JSONDecoder object using json.JSONDecoder(strict=False) . This would retain all the escaped characters in a way that JSON would be able to send back. However, I didn't go too far down this line as I thought it might cause issues if we needed to update special characters in the metadata.