CrossRef / rest-api-doc

Documentation for Crossref's REST API. For questions or suggestions, see https://community.crossref.org/
Other
736 stars 269 forks source link

NPE in funder date formatting #326

Open afandian opened 6 years ago

afandian commented 6 years ago
2018-Feb-10 11:01:02 +0000 XXXXXXXXXXXXXXXXXXXX ERROR [cayenne.schedule] - Failed to update funders from RDF
              org.quartz.simpl.SimpleThreadPool$WorkerThread.run          SimpleThreadPool.java:  557
                                 org.quartz.core.JobRunShell.run               JobRunShell.java:  213
                         cayenne.schedule.update-funders/execute                   schedule.clj:  120
                                           clj-time.format/parse                     format.clj:  149
                              clj-time.format/parse/invokeStatic                     format.clj:  153
            org.joda.time.format.DateTimeFormatter.parseDateTime         DateTimeFormatter.java:  853
org.joda.time.format.DateTimeFormatterBuilder$Composite.parseInto  DateTimeFormatterBuilder.java: 2741
org.joda.time.format.DateTimeFormatterBuilder$TextField.parseInto  DateTimeFormatterBuilder.java: 1874
java.lang.NullPointerException:
afandian commented 6 years ago

NPE being thrown for

https://github.com/CrossRef/cayenne/blob/master/src/cayenne/schedule.clj#L120

    (defjob update-funders [ctx]
      (try
        (info "Updating funders from new RDF")
        (let [time-of-this-update (time/now)
              time-of-previous-update (get-last-funder-update)
              last-modified-header (-> @(http/head (conf/get-param [:location :cr-funder-registry]))
                                       :headers :last-modified)
>>>           funders-last-modified (timef/parse last-modified-format last-modified-header)]
>>>       (when (time/after? funders-last-modified time-of-previous-update)
            (funder/clear!)
            (funder/drop-loading-collection)
            (funder/load-funders-rdf (java.net.URL. (conf/get-param [:location :cr-funder-registry])))
            (funder/swapin-loading-collection)
            (write-last-funder-update time-of-this-update)))
        (catch Exception e (error e "Failed to update funders from RDF"))))

This looks like the "Last-Modified" HTTP header is being returned empty from the cr-funder-registry URL. I reckon this is because the server doesn't know this - maybe the timestamp is missing from the file. Either way it shouldn't be compulsory, and its absence shouldn't be a problem.

I would put a failsafe in here to handle when it's nil:

>>>           funders-last-modified (when last-modified-header (timef/parse last-modified-format last-modified-header))]
>>>         (when (or (nil? funders-last-modified) (time/after? funders-last-modified time-of-previous-update))

i.e. if the header is missing, assume that we need to re-ingest

ckoscher commented 6 years ago

The Last-Modified date is present on the HEAD request for the registry file. Tested repeatedly.

Doing an ingest each time the date is nil is not practical, based on observations this will occur up to 12 times a day.

Root cause is: in the datacenter the request frequently takes a long time to reach dx.doi.org