CADWRDeltaModeling / vtools3

Pandas/xarray implementation of the most important functionality of vtools, emphasizing csv and netcdf as data sources.
https://cadwrdeltamodeling.github.io/vtools3/
Apache License 2.0
7 stars 1 forks source link

read_ts #14

Closed water-e closed 4 years ago

water-e commented 4 years ago

We need to convert read_ts to produce pandas or xarray series. The stuff in read_scalar is a wrapper around pd.read_csv that encapsulates many, but not all, use cases. USGS will be the hardest, because we need the added step of converting to PST.

Previous API in pyschism was ts = read_ts(path, selector) "selector" was only sparsely implemented ... mostly for USGS. Tabular data with many columns is likely to be more common going forth, so suggest selector just be a list of column names. Different specific readers will still need to know the magic columns and quirks, and it pays not to revert from C to python time parsers.

Issues: Shall we continue to emphasize the current read_ts use of sniffers? Should we start to prefer hints or rely more on file extensions? If we go the sniffer route we can base it off the top X lines of the file.

kjnam commented 4 years ago

Is the read_scalar function the new way? We can build features around it. I do not think the file extension is enough to tell file formats because the extensions are generic such as txt ,csv in many cases. Maybe we can separate out the sniffer, and it can be used by readers or as an argument of a reader.

water-e commented 4 years ago

I have collapsed the name back to read_ts. The engine of this is the csv_retrieve_ts function that wraps a bunch of checking/regular time stamping/flag application chores. The argument lists have gotten big and the number of callbacks too, so I think we could go back to a class-based design. The number of special handling items was bigger than I thought going in. Anyhow I'm going to close this issue and we can make improvement a new task.