Open sdm7g opened 5 years ago
On further use and testing, I have found a couple of cases where it does recover from parsing Mal-formed payload, but where the issue is unmatching tags, the errors cascade outward to the OAI container and cause the resumption token to not be found and parsed correctly, thus the harvest halts. ( At least that my initial diagnosis. ) It still gives plenty of parser warning messages before halting, however, I'ld like to figure out a better way to unambiguously identify this case.
I still consider this behavior better than before, but perhaps it should be documented that --recover
could have this side effect.
This adds XMLParser( recover=True ) option to pyoai/client and includes that module locally. #25 ( I've had an outstanding pull request to fix code in that project. ) Adds an optional --recover / --no-recover option to command line args. ( default is --no-recover, to be conservative and keep the same behavior ) Adds code to log errors as warnings when using recover option, and to log which identifiers had parser errors, so that upstream feeds can be notified. And fixed some issues with logger config that cropped up when testing this code: harvest.py logging now goes to harvest.log, registry.py goes to registry.log #26 ; and added some missing whitespace in help strings.