GCRC / nunaliit

Nunaliit Atlas Framework
BSD 3-Clause "New" or "Revised" License
46 stars 15 forks source link

Invalid DateStructure - JSONObject["min"] not found #923

Open ahayes opened 4 years ago

ahayes commented 4 years ago

This error sometimes appears when processing PDF attachments. Noticed on NRI atlas after a restart using nunaliit_2.2.9-SNAPSHOT_2020-06-11_fb292e8

2020-07-03 15:10:16,013[ERROR]: Error performing work PROCESS_DOC
org.json.JSONException: JSONObject["min"] not found.
    at org.json.JSONObject.get(JSONObject.java:573)
    at org.json.JSONObject.getNumber(JSONObject.java:712)
    at org.json.JSONObject.getLong(JSONObject.java:785)
    at ca.carleton.gcrc.couch.date.impl.DateStructureElement.<init>(DateStructureElement.java:15)
    at ca.carleton.gcrc.couch.date.impl.DateRobotThread.performProcessDocument(DateRobotThread.java:281)
    at ca.carleton.gcrc.couch.date.impl.DateRobotThread.performWork(DateRobotThread.java:237)
    at ca.carleton.gcrc.couch.date.impl.DateRobotThread.activity(DateRobotThread.java:125)
    at ca.carleton.gcrc.couch.date.impl.DateRobotThread.run(DateRobotThread.java:91)
roikle commented 4 years ago

Writing to confirm that I've noticed the same error on my local system.

ahayes commented 4 years ago

If you can isolate a document that is causing this error and examine any nunaliit date objects it contains, that might provide some clues. It looks like line 15 of nunaliit2-couch-date/src/main/java/ca/carleton/gcrc/couch/date/impl/DateStructureElement.java is trying to obtain the minimim (or only) date from a nunaliit date object and failing.

roikle commented 4 years ago

The error is being caused by an invalid date structure, produced by invalid dates in the import data:

{
    "date":"215-02-03",
    "nunaliit_type":"date",
    "index":1
}

The dateStructure class is expecting a "min" key within this json object, and when the Class' constructor attempts to get the long value from this "min" key when it doesn't exist, it causing an error to occur. 

I'd recommend we add some defensive code, such as checking if the min key exists before we attempt to use it.
ahayes commented 4 years ago

In the case of the data being wrong, it isn't a bad thing to throw an exception, as long as we move on (which I think we do in this case.) It would be nice if we output the key and problematic value that caused the problem as well as the id of the doc causing problems.

In this case, we should also look into the importer code. We shouldn't have invalid Nunaliit date objects getting created in the first place, although I recognize that we don't expect that we are always the ones creating the documents used by Nunaliit. But for the importer, we might need more checks.