GSA / datagov-wptheme

Data.gov WordPress Theme (obsolete)
https://www.data.gov
Other
1.88k stars 411 forks source link

Geospatial metadata records with point bounding boxes are rejected during harvesting #618

Open oconnor77 opened 9 years ago

oconnor77 commented 9 years ago

Geospatial metadata records with point bounding boxes are currently rejected during harvesting due to the in ability in CKAN to store point data as captured in the notes at these lines

https://github.com/GSA/ckanext-spatial/blob/datagov/ckanext/spatial/harvesters/base.py#L310-L323

This is a bug in the system. Both the originating metadata sources (CSDGM and ISO) allow for point coordinates to be captured in these fields. In the records that we are unable to publish the data are a time series of oceanographic or meteorological measurements collected at a single location. There is no bounding box.

Previous guidance provided when discussing this issue has been to manually alter the originating metadata to create "the box" needed to pass validation. However, we are reluctant to alter the source documentation because it accurately reflects the data and we don't want to introduce errors into the documentation to meet portal specific publication requirements which often change as systems evolve.

If the system can truly not handle a point, it would make sense in my opinion for the system to pad one coordinate pair by a thousandth of a degree on ingest and store this set of coordinates in the database. This would create the locally needed "bounding box" while still providing users with the accurate coordinates when linking to the originating source metadata.

rsignell-usgs commented 9 years ago

We here at USGS have lots of time series records that have point bounding boxes also, so I'm also interested in getting this resolved.

JJediny commented 8 years ago

Any suggestions on a responsible approach to address this? Using a single point to derive a BBOX could get sloppy if the datasets are being published at a variety of scales no? However I could see value in a rule that if xmin == xmax == ymin == ymax then incrementally subtract 0.000000001 to the xmin/ymin and add 0.000000001 to xmax/ymax? So that it's creating a micro-polygon - which would still be useful in CKAN spatial filter as long as dataset is contained in the bounds then it will still list it?

rsignell-usgs commented 8 years ago

This issue came up on the NOAA Data Management and Integration Team meeting today. Other catalog systems handle this just fine -- why not data.gov?

What is the problem here? It would seem that geospatial search would be fine, since xmin <= x & x <=xmax returns True even when xmin = x = xmax.

JJediny commented 8 years ago

On second look... And haven't looked into this from how CKAN-spatial handles this but seems that there's no reason (given data.gov spatial field stores as a geojson that a point couldn't be represented as a single point straight through to CKAN but at the same time what is the spatial extent of an example dataset - could one agree the point is just as good a filter as any bbox?

FuhuXia commented 8 years ago

ckan spatial extension handles both Point type and Polygon fine. But when it comes to the backend spatial search, we use solr, and solr does not support anything other than BBOX Polygon. If we are ok to the fact that Point type will be excluded from spatial search result, then it is fine to harvest point type.

JJediny commented 8 years ago

Ah was wondering what the real issue was - missed the solr aspect completely thanks @FuhuXia... So looks like the two options either a dumb addition/subtraction of a degree to fake a micro-BBOX or leverage Python shapely to process a buffer from the point and figure a way to derive down to a rectangle to store in postgis/use for solr?

FuhuXia commented 8 years ago

A better option would be upgrading our solr and/or ckan core to the versions that Point will be indexed and able to be searched. We are using ckan 2.1a and solr 4. I heard people have better luck when they use ckan 2.3 and solr 5.

JJediny commented 8 years ago

+1 if the upgrade will nullify the issue

kvuppala commented 8 years ago

@JJediny @FuhuXia @philipashlock The main issue referred is implemented, should we close this issue? I think we have a separate issue related to search not giving the point spatial datasets.