NEU-Libraries / cerberus

Digital Repository Service
8 stars 0 forks source link

XML loader rejecting records without topical subject terms #1071

Closed sarahjeansweeney closed 7 years ago

sarahjeansweeney commented 7 years ago

The XML loader is rejecting files in an XML metadata-load that do not have topical subject terms (similar to issue #748).

Michelle submitted this load yesterday: https://repository.library.northeastern.edu/loaders/xml/report/1421 More than 90 records failed with this error: "Error - Must have at least one keyword"

I inserted a topical term into one of the records and it loaded successfully: https://repository.library.northeastern.edu/loaders/xml/report/1426

At least one subject is required, but we shouldn't require it to be a topical term.

elizoller commented 7 years ago

The xml and spreadsheet loader both look for core_file.keywords not to be blank - this is mapped to core_file.mods.topics It shouldn't be difficult to check for other keywords although conceptually I'm not sure if this makes the loaders deviate more from the edit form which as far as I know maps to topics and validates on that.

dgcliff commented 7 years ago

I think it has to deviate from the edit form. Users that don't have access to the XML editor shouldn't be able to alter authorized (LCSH) keywords from the edit form, so .keywords doesn't pick them up (is my understanding).

We could make a new helper method all_keywords, or similar?

elizoller commented 7 years ago

So the xml editor and the loaders would use this filter for all_keywords that would check for any subject (regardless of whether it is a topic or not) and then the edit form would use the keywords check (specifically looking at subject topics- that are not authorized)?

dgcliff commented 7 years ago

@elizoller the XML validator does keyword checking this way, which I don't think restricts to topic

https://github.com/NEU-Libraries/cerberus/blob/master/lib/helpers/xml_validator.rb#L76

Do we not already do validation with that helper? Are we being redundant by checking for keywords again?

elizoller commented 7 years ago

Yeah, the xml loader for existing files does a health check on the existing record - it looks to see if the record exists, if it is a corefile object, if it is not tombstoned/inprogress/incomplete, if it is healthy (parent and depositor checks), and checks for title and keyword. if it passes all those checks then it proceeds with the load. i suppose it shouldn't check for the title and keyword at this point since the point is that we're overriding the metadata anyway.