Unidata / siphon

Siphon - A collection of Python utilities for retrieving atmospheric and oceanic data from remote sources, focusing on being able to retrieve data from Unidata data technologies, such as the THREDDS data server.
https://unidata.github.io/siphon
BSD 3-Clause "New" or "Revised" License
216 stars 75 forks source link

Add a walk function for navigating THREDDS catalogue #754

Open tlogan2000 opened 11 months ago

tlogan2000 commented 11 months ago

As far a I know this functionality does not exist already but believe it would be a welcome addition :

I often need to find all datasets for multiple subfolders of a thredds catalogue. To do this I resort to using a custom function in my data processing scripts (see simple example below) but ideally this would be built into siphon itself.

from siphon.catalog import TDSCatalog
# walk function
def walk(cat, depth=1):
    """Return a generator walking a THREDDS data catalog for datasets.

    Parameters
    ----------
    cat : TDSCatalog
      THREDDS catalog.
    depth : int
      Maximum recursive depth. Setting 0 will return only datasets within the top-level catalog. If None,
      depth is set to 1000.
    """
    yield from cat.datasets.items()
    if depth is None:
        depth = 1000

    if depth > 0:
        for name, ref in cat.catalog_refs.items():
            try:
                child = ref.follow()
                yield from walk(child, depth=depth - 1)

            except requests.HTTPError as exc:
                LOGGER.exception(exc)

# creat catalogue
cat = TDSCatalog(urlcat)
# access all dataset to 20 subfolders
for dd in (cat, depth=20):
    print(dd)
dopplershift commented 10 months ago

This seems like it could be a nice addition. Would you be interested in submitting a PR adding it? My only question is whether yielding from items() (so name, Dataset pairs) makes the most sense, or whether just the Dataset would be enough, since you could still get the name from ds.name?

tlogan2000 commented 10 months ago

@dopplershift Sorry for the delay yes I can try to throw something together in the coming weeks. Would the most logical place to make the addition simply be a new method in the catalogue class?