Add support for Regular Time Series - Githubissues

HydrologicEngineeringCenter / FIRO_TSEnsembles

Time series of ensembles in SqLite

Other

2 stars 2 forks source link

Add support for Regular Time Series #45

Open HenryGeorgist opened 4 years ago

HenryGeorgist commented 4 years ago

Please update the table model to allow for regular time series

HenryGeorgist commented 4 years ago

@MikeNeilson I think we should do this item first, then we an work on #46

MikeNeilson commented 4 years ago

That is my plan.

MikeNeilson commented 4 years ago

Do we want to enforce a naming scheme the includes information like interval, or to we want to let the name be a completely arbitrary string and store the other information as meta data?

I'm inclined to the meta data approach but I don't think the other is valueless.

MikeNeilson commented 4 years ago

welp, realized we have to do both as we need a fixed ID in the table of time series. I'll try to use characters that are almost never used to seperate that info.

ktarbet commented 4 years ago

can you please give an example table name?

MikeNeilson commented 4 years ago

I don't have a complete example yet. But for a partial example I have the following meta data in one of my test time series CSV files: interval: P1D, units: raw, name: RegularIntervalTest_of_1Day_without_leapyear

we can't have just RegularIntervalTest_of_1Day_without_leapyear in the catalogk as I might have a 15minute or hourly value.

My thought is some list arbitrary name|..

MikeNeilson commented 4 years ago

make that arbitrary name|<interval>.<duration>.<units> or something like that. Sorry the special characters got desstroyed by the wiki processor.

ktarbet commented 4 years ago

how about a ts and pd prefixs for time series, paired data?

MikeNeilson commented 4 years ago

don't know if I want to encode meta data information as a prefix like that.

perhaps something like: ensemble|<name>|other information timeseries|<name>|other infromation paired|<name>|other information That would be the catalog view. Than the unique index on the catalog table would be (datatype,name,extra)

This way I can have the following: ' ensemble|Folsom Inflow|PT1H.PT0S.kcfs timeseries|Folsom Inflow|PT1H.PT1H.cfs paired|Folsom.Elev.Stor|ft->ac-ft ' That way similarly named things can be grouped together.

MikeNeilson commented 4 years ago

And I just remembered that TimeSeriesIdentifier exists. I'm going to have to update that.

MikeNeilson commented 4 years ago

For anyone interested, I've uploaded today's work to the branch timeseries. The test doesn't pass yet. That's intentional; I haven't updated the database schema yet and some of the data point insert logic is a bit off, I had to make a fair number of interface changes so I'd like to get feedback before I go too much further.

HenryGeorgist commented 4 years ago

in general I think this is good. I think we need to consider how we want to deal with identifier - what is it exactly, what is appropriate for us to use. For example, it seems like the ensemble identifier implementation is missing critical information - this means that we have additional parameters on the method to gather a specific ensemble from the jdbctimeseriesdatabase - maybe those parameters should be exctracted into the ensembleidentifier?

MikeNeilson commented 4 years ago

@ktarbet I've left the data as is for now. If you can describe what other meta data is required to uniquely identify an ensemble I can work on tweaking the interfaces and shifting data around in the tables.

HenryGeorgist commented 4 years ago

I have verified that this works - i can store time series data, but I dont have a great way to confirm my data is valid without writing code. I am not opposed to that, but it does make my validation of testing a bit harder to do. I am also observing that the resulting file is larger than the original dss file the time series came from.

MikeNeilson commented 4 years ago

The larger size doesn't suprise me for this initial cut. The Blocked storage takes a year of data and just does gzip on it. Depending on the interval that may not be very efficient. A year of 15 minute rain; probably good. Daily flow, probably not. Once we have more varied and real data available we can start measuring and making improvements.