GFDRR / thinkhazard_processing

Think Hazard: Overcome Risk - Processing module
0 stars 1 forks source link

harvesting based on dates #28

Open fvanderbiest opened 8 years ago

fvanderbiest commented 8 years ago

Harvesting http://45.55.174.20/api/layers/264/ fields (chronologically ordered):

"creation_date":     "2015-11-03T07:20:00",
"publication_date":  "2015-11-03T07:20:00",
"date": "2015-11-03T07:20:00",
"date_type": "publication",
"metadata_update_date": "2015-12-16T16:01:18.182183",
"csw_insert_date":   "2015-12-16T16:01:18.213792",
"data_update_date":  "2015-12-22T04:49:14.909047",

I find it counter-intuitive that data_update_date > metadata_update_date

We have to take this into account for our geonode harvesting process. We don't want to miss when a data was last updated. Yet, if we harvest GeoNode with &metadata_update_date__gte=2015-05-30T13:00:00 this kind of thing might happen.

fvanderbiest commented 8 years ago

@ingenieroariel do you see what I mean here ?

fvanderbiest commented 8 years ago

Well, that's not so important if data_update_date > metadata_update_date.

We can imagine that our harvesting process will perform two queries to GeoNode (instead of just one, as I originally thought):

If query A fetches records, we have to update the corresponding Layer record in our database, and set the related HazardSet:complete, HazardSet:processed to false.

If query B fetches records, we have to set Layer:downloaded, HazardSet:complete, HazardSet:processed to false.

ingenieroariel commented 8 years ago

Reading this now I think we need to check on the geonode side if those dates are correct.

I'll look at it next week and post here an update.

fvanderbiest commented 8 years ago

I'll look at it next week and post here an update.

Thanks Ariel !

fvanderbiest commented 8 years ago

Ping @ingenieroariel : can you confirm that data_update_date and metadata_update_date are independant in GeoNode ? Thanks.