band_names -> bands following 81665b1

wkearn commented 7 years ago

I encountered this while hunting down some errors I get when trying to run batch. This is not the end of that struggle, but it seemed a pretty straightforward change. Let me know if I misunderstood what was supposed to be going on here.

coveralls commented 7 years ago

Coverage remained the same at 54.723% when pulling 784d6474a80d8d5989443fae17795b5287215397 on wkearn:band_names into d7f90f859153dfc76fc744928261611495158e97 on ceholden:master.

ceholden commented 7 years ago

Thanks, Will! Thanks for catching this -- I basically haven't run yatsm batch since I (re)started mucking with the IO API. The longer term goal of this submodule is basically a set of "drivers" that wrap individual data providers so the rest of YATSM can use a familiar API to reach data from USGS ARD data sources, the Australian Geoscience Data Cube, "stacked" data that we've traditionally used, and really anything that someone wants to use. I suspect the exact definition of the API will be in flux a bit until I get into implementing one of these other sources, but for getting to v0.7.0 I'll just stick with what seems to model the GDAL access route well enough.

I'm curious what you think about the following:

I like using bands for the read_dataarray call because it's shorter and matches better with indexes (which rasterio.DatasetReader.read uses as argument, so indexes should be familiar)
The config file still uses band_names, which I kinda like over just bands because it implies str inputs

At the risk of bike shedding... should they be the same name for consistency? If so, I think you just need to change the configuration file schema (here) to check for bands instead of band_names. I'm happy to merge now but just thought I'd ask you beforehand

Thanks! I'm really thrilled you took a look and would be happy to chat next week when I'm back.

wkearn commented 7 years ago

Re bands vs. band_names: you'll have to tell me what the pythonic approach to these things is, but I would tend to prefer that the names indicate the types that you're expecting, which would prefer band_names for read_dataarray. To me, band_names suggests an array of strings while bands suggests something else: an array of Bands whatever those might be? the arrays of data themselves? an array of band indexes (which is really just indexes)?.

But I'm not super committed one way or the other. If I understand correctly, end users who are going to be writing the config files probably aren't going to call read_dataarray themselves.

Re batch: is there a minimal make-it-go example somewhere for now? I have a bunch of Landsat stacks and a YAML config file that roughly follows this example, and I'd like to run the pipeline just on my local machine.

ceholden / yatsm

band_names -> bands following 81665b1 #96