Open HenryGeorgist opened 4 years ago
@MikeNeilson I think we should do this item first, then we an work on #46
That is my plan.
Do we want to enforce a naming scheme the includes information like interval, or to we want to let the name be a completely arbitrary string and store the other information as meta data?
I'm inclined to the meta data approach but I don't think the other is valueless.
welp, realized we have to do both as we need a fixed ID in the table of time series. I'll try to use characters that are almost never used to seperate that info.
can you please give an example table name?
I don't have a complete example yet. But for a partial example I have the following meta data in one of my test time series CSV files: interval: P1D, units: raw, name: RegularIntervalTest_of_1Day_without_leapyear
we can't have just RegularIntervalTest_of_1Day_without_leapyear in the catalogk as I might have a 15minute or hourly value.
My thought is some list
arbitrary name|
make that arbitrary name|<interval>.<duration>.<units>
or something like that. Sorry the special characters got desstroyed by the wiki processor.
how about a ts and pd prefixs for time series, paired data?
don't know if I want to encode meta data information as a prefix like that.
perhaps something like:
ensemble|<name>|other information timeseries|<name>|other infromation paired|<name>|other information
That would be the catalog view. Than the unique index on the catalog table would be (datatype,name,extra)
This way I can have the following: ' ensemble|Folsom Inflow|PT1H.PT0S.kcfs timeseries|Folsom Inflow|PT1H.PT1H.cfs paired|Folsom.Elev.Stor|ft->ac-ft ' That way similarly named things can be grouped together.
And I just remembered that TimeSeriesIdentifier exists. I'm going to have to update that.
For anyone interested, I've uploaded today's work to the branch timeseries. The test doesn't pass yet. That's intentional; I haven't updated the database schema yet and some of the data point insert logic is a bit off, I had to make a fair number of interface changes so I'd like to get feedback before I go too much further.
in general I think this is good. I think we need to consider how we want to deal with identifier - what is it exactly, what is appropriate for us to use. For example, it seems like the ensemble identifier implementation is missing critical information - this means that we have additional parameters on the method to gather a specific ensemble from the jdbctimeseriesdatabase - maybe those parameters should be exctracted into the ensembleidentifier?
@ktarbet I've left the data as is for now. If you can describe what other meta data is required to uniquely identify an ensemble I can work on tweaking the interfaces and shifting data around in the tables.
I have verified that this works - i can store time series data, but I dont have a great way to confirm my data is valid without writing code. I am not opposed to that, but it does make my validation of testing a bit harder to do. I am also observing that the resulting file is larger than the original dss file the time series came from.
The larger size doesn't suprise me for this initial cut. The Blocked storage takes a year of data and just does gzip on it. Depending on the interval that may not be very efficient. A year of 15 minute rain; probably good. Daily flow, probably not. Once we have more varied and real data available we can start measuring and making improvements.
Please update the table model to allow for regular time series