Closed mikej888 closed 5 years ago
8c14a46ac6accd9e2446081999ef8640fac2cd65 to 39552d033be868ee1815a5a28e3e079a561bc465 has refactored code to create objects and filter these on the basis of successful or failed object creation. Files giving rise to failures are recorded, along with the error message, in a separate YAMLfile e.g. rerunning the above:
cat results.yml
{1714: 115, 1715: 302,
...
, 1950: 7155}
cat errors.yml
- [/.../some-issue.xml,
'Document is empty, line 1, column 1 (line 1)']
If defoe.papers.issue.Issue fails to parse an XML document an object is still built with empty strings, lists etc as fields and
datetime.now()
as a date. This can give misleading results when running queries in which such documents are encountered e.g.