Open afandian opened 6 years ago
NPE being thrown for
https://github.com/CrossRef/cayenne/blob/master/src/cayenne/schedule.clj#L120
(defjob update-funders [ctx]
(try
(info "Updating funders from new RDF")
(let [time-of-this-update (time/now)
time-of-previous-update (get-last-funder-update)
last-modified-header (-> @(http/head (conf/get-param [:location :cr-funder-registry]))
:headers :last-modified)
>>> funders-last-modified (timef/parse last-modified-format last-modified-header)]
>>> (when (time/after? funders-last-modified time-of-previous-update)
(funder/clear!)
(funder/drop-loading-collection)
(funder/load-funders-rdf (java.net.URL. (conf/get-param [:location :cr-funder-registry])))
(funder/swapin-loading-collection)
(write-last-funder-update time-of-this-update)))
(catch Exception e (error e "Failed to update funders from RDF"))))
This looks like the "Last-Modified" HTTP header is being returned empty from the cr-funder-registry URL. I reckon this is because the server doesn't know this - maybe the timestamp is missing from the file. Either way it shouldn't be compulsory, and its absence shouldn't be a problem.
I would put a failsafe in here to handle when it's nil:
>>> funders-last-modified (when last-modified-header (timef/parse last-modified-format last-modified-header))]
>>> (when (or (nil? funders-last-modified) (time/after? funders-last-modified time-of-previous-update))
i.e. if the header is missing, assume that we need to re-ingest
The Last-Modified date is present on the HEAD request for the registry file. Tested repeatedly.
Doing an ingest each time the date is nil is not practical, based on observations this will occur up to 12 times a day.
Root cause is: in the datacenter the request frequently takes a long time to reach dx.doi.org