Bulk processing funcionality

Unidata / rosetta

ρζητα ⇒ rosetta

BSD 3-Clause "New" or "Revised" License

17 stars 7 forks source link

I request this as we would like to bulk process a lot of CTD casts.

Some of these CTD casts are different timeseries on the same station, but most of the casts have unique lat/lon coordinates, which implies that we would also need a good method of inputing these values. As far as I can understand, lat/lon are today "hardcoded" into the templates. We would prefer not to make more than one template to process all our CTD casts where ascii files are produced with the same software - SD200W. As the CTD instrument and software does not have any coordinate references, these have been written down manually on a form and we need to digitalise these in some way. In my experience, there seem to be a tendency to digitalise these by entering all data from e.g. a cruise or a dataset into one spreadsheet. I would love it if there also was functionality to extract some metadata from such a spreadsheet file (probably easiest to work with in csv format) during a bulk processing.

This coordinate/extra metadata issue is not directly necessary for the bulk processing request in general, but I think it might be useful to keep in mind while designing the bulk-processing workflow.

Totally agree. Right now they are hardcoded in the template, which is ok if, say, the instrument platform records lat/lon/depth/time with each observation, like a glider. In order to really make this happen, we need to figure out a clean way of extracting metadata from the header [Unidata/Rosetta#5].

Rosetta can already work with Excel spreadsheets, in which it basically converts them programmatically to csv on the backend. What we do not have at this point is a way of associating an external metadata file with a data file (or set of data files to be bulk processed). In this case, we would need to have a way of describing where the relevant metadata live in the external metadata file. I am thinking that could be tied into the data header mining code.

The bulk processing workflow isn't totally figured out yet, so I am open to suggestions as to what would make sense from, say, a data scientists point of view. It's helpful to know about the tendency to keep the cruise into a spreadsheet. My background is in observational meteorology, so I lack perspective from other observationally based geosciences.

Unidata / rosetta

Bulk processing funcionality #18