GSA / datagov-wptheme

Data.gov WordPress Theme (obsolete)
https://www.data.gov
Other
1.88k stars 411 forks source link

CSW Query on data time range #592

Closed rsignell-usgs closed 9 years ago

rsignell-usgs commented 9 years ago

In the Ipython Notebook here: http://nbviewer.ipython.org/gist/rsignell-usgs/64f3ccd39ecb7c69cd18 I try restricting a Data.gov CSW query by apiso:AnyText and:

To specify the date range, I'm using apiso:TempExtent_begin and apiso:TempExtent_end.

This is the same approach I've used successfully on other CSW endpoints, so I'm wondering why it doesn't work here.

Is it because: A) the temporal search is not implemented B) I'm doing something wrong C) The data begin and end times are not populated in the metadata at Data.gov D) something else?

rsignell-usgs commented 9 years ago

@kalxas, do you know the answer here?

kalxas commented 9 years ago

hi @rsignell-usgs I am testing and will come back on this. Temporal search is implemented in pycsw, that's for sure. I am testing to see if C or D

rsignell-usgs commented 9 years ago

@kalxas, excellent. Just for reference, here's a CSW example working with time extent filters on NGDC Geoportal: http://nbviewer.ipython.org/gist/rsignell-usgs/fab9171740f6bef11a5e

kalxas commented 9 years ago

After some testing I see that the TempExtent_begin and TempExtent_end columns are not populated. Two possibilities: Either this information is missing from the original XML files, or the information was not parsed due to an XML parsing bug. Still looking...

rsignell-usgs commented 9 years ago

@kalxas , okay, thanks for the update.

If it helps your investigation, here is the metadata for this dataset http://catalog.data.gov/dataset/temperature-and-salinity-profile-data-from-the-whoi-hawaii-ocean-time-series-site-whots-program that originated at NODC: http://data.nodc.noaa.gov/geoportal/rest/document?id={909165DB-55D4-4773-A3DE-2EEE8F7F3CC4} you can see that at least in the original metadata, the temporal extent information is included:

<gmd:temporalElement>
  <gmd:EX_TemporalExtent id="boundingTemporalExtent">
     <gmd:extent>
         <gml:TimePeriod gml:id="boundingTemporalExtentPeriod">
            <gml:beginPosition>2004-08-12</gml:beginPosition>
           <gml:endPosition>2011-07-11</gml:endPosition>
        </gml:TimePeriod>
    </gmd:extent>
  </gmd:EX_TemporalExtent>
</gmd:temporalElement>

I'm hoping this wasn't a casualty of metadata conversion. Did this enter into data.gov as dumbed- down JSON metadata?

kalxas commented 9 years ago

@rsignell-usgs it seems that the ISO parsing failed for the time extent. We need to update our OWSLib to include this commit: https://github.com/geopython/OWSLib/commit/ee418807647ee1d5b34e4d93a767dde83307baa8

rsignell-usgs commented 9 years ago

Its great you found the problem!!

On Saturday, February 28, 2015, Angelos Tzotsos notifications@github.com wrote:

@rsignell-usgs https://github.com/rsignell-usgs it seems that the ISO parsing failed for the time extent. We need to update our OWSLib to include this commit: geopython/OWSLib@ee41880 https://github.com/geopython/OWSLib/commit/ee418807647ee1d5b34e4d93a767dde83307baa8

— Reply to this email directly or view it on GitHub https://github.com/GSA/data.gov/issues/592#issuecomment-76525230.

Dr. Richard P. Signell (508) 457-2229 USGS, 384 Woods Hole Rd. Woods Hole, MA 02543-1598

FuhuXia commented 9 years ago

we have cherry-picked this commit into GSA branch: https://github.com/GSA/OWSLib/commit/4b41fe648871a1fe164eb48e5518c826f9e5e1a1

kalxas commented 9 years ago

@FuhuXia we need to re-ingest all records in order for temporal information to show up after this patch.

kvuppala commented 9 years ago

@kalxas the fix is applied in staging and production, re-ingest of all records is in process in production, once complete we will switch the database to take effect.

rsignell-usgs commented 9 years ago

@kalxas, will this fix data.noaa.gov as well as data.gov?

Or will we need to apply the same process there?

kvuppala commented 9 years ago

@rsignell-usgs not sure on the noaa instance, the fix and re-ingest of data needs to be completed on noaa side as well.

kalxas commented 9 years ago

@kvuppala thanks for the update @rsignell-usgs probably NOAA needs to re-ingest

rsignell-usgs commented 9 years ago

@Yuanjie0913, I'm not sure who is in charge of data.noaa.gov.
Can you make sure that the proper people are informed?

kalxas commented 9 years ago

@rsignell-usgs we have to make sure of the OWSLib version that NOAA uses before we recommend a patch.

rsignell-usgs commented 9 years ago

@kvuppala , I just tried my notebook test again and it looks like we still have the issue, so I guess the reinjest and database switch have not ocurred yet, right? Any ETA on this? (I'm just curious how long the process takes -- I know there are a lot of records)

Yuanjie0913 commented 9 years ago

It would be nice if the temporal search can be until date and time.

kvuppala commented 9 years ago

@rsignell-usgs We are hoping the indexing of the data might finish over the weekend or early next week, after that we will need to switch the databases to take effect. We will update here once completed.

kvuppala commented 9 years ago

@rsignell-usgs indexing is complete, can you try the query again.

rsignell-usgs commented 9 years ago

Woohoo!! Now working! 151 datasets instead of zero! http://nbviewer.ipython.org/gist/rsignell-usgs/64f3ccd39ecb7c69cd18 Great job guys!

kvuppala commented 9 years ago

:+1:

kalxas commented 9 years ago

cool