Unidata / siphon

Siphon - A collection of Python utilities for retrieving atmospheric and oceanic data from remote sources, focusing on being able to retrieve data from Unidata data technologies, such as the THREDDS data server.
https://unidata.github.io/siphon
BSD 3-Clause "New" or "Revised" License
211 stars 75 forks source link

No access urls created for Dataset #734

Open dopplershift opened 10 months ago

dopplershift commented 10 months ago

Not sure what's going on, but this code:

from siphon.catalog import TDSCatalog
thredds_url_hind = "https://tds.hycom.org/thredds/catalogs/GLBy0.08/expt_93.0.xml"
cat_hind = TDSCatalog(thredds_url_hind)
ds_hind = cat_hind.datasets[0].remote_access()

fails due to no OPeNDAP (or any other service) url being generated in access_urls. Manually clicking the link on the HTML catalog page gives a set of access URLs that seem to work fine.

Taken from this Stack Overflow question.

tdrwenski commented 10 months ago

possible duplicate of: https://github.com/Unidata/siphon/issues/715 (there I also saw the access URLs in the thredds html but they were missing with siphon. Both situations seem to involved a nested catalog too)

jamespolly commented 10 months ago

Agreed with @tdrwenski that this smells similar. I'd love to help here but will need a nudge in a direction.

jamespolly commented 10 months ago

Adding a little more detail here, hopefully useful.

For reference, a working example:

from siphon.catalog import TDSCatalog
thredds_url_fmrc = "https://tds.hycom.org/thredds/catalog/GLBy0.08/expt_93.0/FMRC/runs/catalog.xml"
cat_fmrc = TDSCatalog(thredds_url_fmrc)
ds_fmrc = cat_fmrc.datasets[0].remote_access()

where cat_fmrc.datasets gives:

['GLBy0.08_930_FMRC_RUN_2023-09-19T12:00:00Z', 'GLBy0.08_930_FMRC_RUN_2023-09-18T12:00:00Z', 'GLBy0.08_930_FMRC_RUN_2023-09-17T12:00:00Z', 'GLBy0.08_930_FMRC_RUN_2023-09-16T12:00:00Z', 'GLBy0.08_930_FMRC_RUN_2023-09-15T12:00:00Z', 'GLBy0.08_930_FMRC_RUN_2023-09-14T12:00:00Z']

and cat_fmrc.datasets[0].access_urls gives:

{'OPENDAP': 'https://tds.hycom.org/thredds/dodsC/GLBy0.08/expt_93.0/FMRC/runs/GLBy0.08_930_FMRC_RUN_2023-09-19T12:00:00Z',
 'NetcdfSubset': 'https://ncss.hycom.org/thredds/ncss/grid/GLBy0.08/expt_93.0/FMRC/runs/GLBy0.08_930_FMRC_RUN_2023-09-19T12:00:00Z',
 'WMS': 'https://wms.hycom.org/thredds/wms/GLBy0.08/expt_93.0/FMRC/runs/GLBy0.08_930_FMRC_RUN_2023-09-19T12:00:00Z',
 'WCS': 'https://wcs.hycom.org/thredds/wcs/GLBy0.08/expt_93.0/FMRC/runs/GLBy0.08_930_FMRC_RUN_2023-09-19T12:00:00Z'}

and

In [8]: cat_fmrc.services
Out[8]: [<siphon.catalog.CompoundService at 0x7efc9b79fcd0>]

In [9]: cat_fmrc.services[0]
Out[9]: <siphon.catalog.CompoundService at 0x7efc9b79fcd0>

In [10]: cat_fmrc.services[0].services[0]
Out[10]: <siphon.catalog.SimpleService at 0x7efc9b79fc40>

In [11]: cat_fmrc.services[0].services[0].service_type
Out[11]: 'OPENDAP'

Now the same stuff but for the non-working example shown in the first post (not repeated here): cat_hind.datasets gives:

['GLBy0.08_expt_93.0 (ssh, ts3z, and uv3z aggregated)', 'GLBy0.08_expt_93.0_ssh (sea_surface_elevation)', 'GLBy0.08_expt_93.0_ts3z (sea_water_temperature and sea_water_salinity)', 'GLBy0.08_expt_93.0_uv3z (eastward_sea_water_velocity and northward_sea_water_velocity)', 'GLBy0.08_expt_93.0_ice (sst, sss, ssu, ssv, sic, sih, siu, siv, surtx, surty)', 'GLBy0.08_expt_93.0_sur (qtot, emp, steric_ssh, ssh, u_barotropic_velocity, v_barotropic_velocity, surface_boundary_layer_thickness, mixed_layer_thickness)']

and cat_hind.datasets[i].access_urls returns {} for any i. Then cat_hind.services returns:

[<siphon.catalog.CompoundService at 0x7f2e9a0bf910>,
 <siphon.catalog.CompoundService at 0x7f2e9a0bf100>,
 <siphon.catalog.SimpleService at 0x7f2e9a0bf790>]

where cat_hind.services[2].service_type returns 'FTP' and cat_hind.services[1].services[0].service_type returns 'OPENDAP'.

jamespolly commented 10 months ago

In catalog.make_access_urls() there are two clauses which, in the non-working example discussed here, are never satisfied or are empty:

  1. The first is on line 570: if service_name in all_service_dict:. This is never satisfied as service_name is NoneType. cat_hind.metadata does not contain an entry for serviceName (per line 562).

Note that:

In [79]: all_service_dict
Out[79]: 
{'all': <siphon.catalog.CompoundService at 0x7f2e9a0bf910>,
 'all-ftp': <siphon.catalog.CompoundService at 0x7f2e9a0bf100>,
 'ftp': <siphon.catalog.SimpleService at 0x7f2e9a0bf790>,
 'ncdods': <siphon.catalog.SimpleService at 0x7f2e9a0bf550>,
 'ncss': <siphon.catalog.SimpleService at 0x7f2e9a0bf7f0>,
 'wms': <siphon.catalog.SimpleService at 0x7f2e9a0bf340>,
 'wcs': <siphon.catalog.SimpleService at 0x7f2e9a0bf940>}

In [80]: all_service_dict['ncdods'].service_type
Out[80]: 'OPENDAP'
  1. The second is on line 585: for service_type in self.access_element_info:. This attribute is empty for all cat_hind.datasets[i]. I'll note that in the working example, this attribute is also empty for all cat_fmrc.datasets[i].

Together, this results in the access_urls never being populated. I'll work on this some more in the coming days.