Unidata / siphon

Siphon - A collection of Python utilities for retrieving atmospheric and oceanic data from remote sources, focusing on being able to retrieve data from Unidata data technologies, such as the THREDDS data server.
https://unidata.github.io/siphon
BSD 3-Clause "New" or "Revised" License
216 stars 75 forks source link

TDSCatalog has no datasets #114

Closed rabernat closed 7 years ago

rabernat commented 8 years ago

I am trying to use siphon for a simple case: seeing all datasets in a THREDDS catalog. http://oceandata.sci.gsfc.nasa.gov/opendap/SeaWiFS/L3SMI/2000/001/contents.html

But it is not returning any datasets.

Example:

from siphon.catalog import TDSCatalog
catalog = 'http://oceandata.sci.gsfc.nasa.gov/opendap/SeaWiFS/L3SMI/2001/001/catalog.xml'
cat = TDSCatalog(catalog)
print(cat.catalog_refs)
print(cat.datasets)

This gives two empty OrderedDicts:

OrderedDict()
OrderedDict()

What am I doing wrong here?

lesserwhirls commented 8 years ago

Hi @rabernat - the server you are hitting is a HYRAX server, which does not advertise its data holdings via THREDDS Catalogs. If you try the URL in a browser, you'll see there is no xml catalog to be parsed. However, if you go to any one of the datasets from the html page, you will see a "Data URL" that you can put directly into NetCDF4.Dataset class to read the data via OPeNDAP.

rabernat commented 8 years ago

Thanks @lesserwhirls for your quick reply.

If you try the URL in a browser, you'll see there is no xml catalog to be parsed.

When I put http://oceandata.sci.gsfc.nasa.gov/opendap/SeaWiFS/L3SMI/2001/001/catalog.xml into the browser, I see a long xml file. Here is a bit of it

<thredds:catalog xmlns:thredds="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:bes="http://xml.opendap.org/ns/bes/1.0#">
<thredds:service name="dap" serviceType="OPeNDAP" base="/opendap/hyrax"/>
<thredds:service name="file" serviceType="HTTPServer" base="/opendap/hyrax"/>
<thredds:service name="wms" serviceType="WMS" base="/ncWMS/wms"/>
<thredds:dataset name="/SeaWiFS/L3SMI/2001/001" ID="/opendap/hyrax/SeaWiFS/L3SMI/2001/001/">
<thredds:dataset name="S2001001.L3m_DAY_CHL_chl_ocx_9km.nc" ID="/opendap/hyrax/SeaWiFS/L3SMI/2001/001/S2001001.L3m_DAY_CHL_chl_ocx_9km.nc">
<thredds:dataSize units="bytes">1990904</thredds:dataSize>
<thredds:date type="modified">2015-10-01T21:23:02</thredds:date>
<thredds:access serviceName="dap" urlPath="/SeaWiFS/L3SMI/2001/001/S2001001.L3m_DAY_CHL_chl_ocx_9km.nc"/>
<thredds:access serviceName="wms" urlPath="?DATASET=lds/SeaWiFS/L3SMI/2001/001/S2001001.L3m_DAY_CHL_chl_ocx_9km.nc&SERVICE=WMS&VERSION=1.3.0&REQUEST=GetCapabilities"/>
</thredds:dataset>
<thredds:dataset name="S2001001.L3m_DAY_CHL_chlor_a_9km.nc" ID="/opendap/hyrax/SeaWiFS/L3SMI/2001/001/S2001001.L3m_DAY_CHL_chlor_a_9km.nc">
<thredds:dataSize units="bytes">1973123</thredds:dataSize>
<thredds:date type="modified">2015-10-01T21:23:14</thredds:date>
<thredds:access serviceName="dap" urlPath="/SeaWiFS/L3SMI/2001/001/S2001001.L3m_DAY_CHL_chlor_a_9km.nc"/>
<thredds:access serviceName="wms" urlPath="?DATASET=lds/SeaWiFS/L3SMI/2001/001/S2001001.L3m_DAY_CHL_chlor_a_9km.nc&SERVICE=WMS&VERSION=1.3.0&REQUEST=GetCapabilities"/>
</thredds:dataset>
...

This looks like a THREDDS Catalog to me. I'm not sure I understand you mean by "does not advertise its data holdings via THREDDS Catalogs". Is this not a THREDDS catalog?

if you go to any one of the datasets from the html page, you will see a "Data URL" that you can put directly into NetCDF4.Dataset class to read the data via OPeNDAP.

If I wanted to manually follow the links, I would have no need for siphon. I am trying to automate this in a script.

lesserwhirls commented 8 years ago

Ok, not sure why I wasn't able to see the xml doc before - I got a generic error page last time I tried. I was unaware that HYRAX servers exposed THREDDS catalogs, so that's my bad.

The issue is that Siphon currently does not use the access elements in the xml document to create the access_urls - the reason is that none of the catalogs that were used to develop siphon used the access element and so it was overlooked.

I've opened an issue to address this bug. Thanks!

rabernat commented 8 years ago

Fantastic, thanks!

dopplershift commented 7 years ago

@rabernat Just got your feedback from the AOSPy workshop--I'll move this up my priority list, but I'm a bit swamped at the moment, so it might be early December before I can look into this.

rabernat commented 7 years ago

@dopplershift: It's not urgent! Work-arounds have been found.