Open florian-huber opened 3 years ago
Possible categories to consider:
univariant
(only one time-dependent variable) vs. multivariant
(> 1 time-dependent variable)multivariant
could also be further divided into: same-type channels
(e.g. EEG -> all channels are similar type of signals) vs different-type channels
absolute time
(precise position in time matters) vs relative time
(translational invariance, but potentially correlated across channels --> "same time" events or events with particular distance) vs time independent
absolute channel
(important in which channel something happens) vs relative channel
local pattern
(e.g. specific peak) vs global pattern
(frequency, variance, trend etc.)numerical
vs categorical
dataSo far, that list above contains some redundancies:
different-type channels
also implies absolute channel
(but same_type channels
could lead to both)Maybe it is also good to decide that we focus on time series classification. And f we use such categories to assess a model regarding its performance for classifying time series, we could also think of other stuff, e.g.:
number of classes
?number and/or dimension of samples
?Data type | Description | Link to example data set | multivariate / univariate | absolute/relative time | same-type/different-type | absolute/relative channel | local/global pattern |
---|---|---|---|---|---|---|---|
EEG | data from electrodes placed on scalp | ... | multivariate | can be both | same-type | absolute channel | can be both |
Wearable motion-sensor data | accelerometer and gyroscope data | ... | multivariate | can be both | different-type | absolute channel? | can be both |
Dataset | Description | Link to dataset | Citation | multivariate / univariate | time structure | same-type/different-type | absolute/relative channel | local/global pattern |
---|---|---|---|---|---|---|---|---|
3W Dataset | Various sensor data to detect rare undesirable real events in oil wells | https://github.com/ricardovvargas/3w_dataset | https://doi.org/10.1016/j.petrol.2019.106223 | multivariate | relative time | different-type | absolute channel | local ? |
Gas sensors for home activity monitoring | MOX gas sensors, and a temperature and humidity sensor | https://archive.ics.uci.edu/ml/datasets/Gas+sensors+for+home+activity+monitoring | see link | multivariate | ? | different-type | absolute channel | ? |
EEG Steady-State Visual Evoked Potential | EEG data | https://archive.ics.uci.edu/ml/datasets/EEG+Steady-State+Visual+Evoked+Potential+Signals# | see link | multivariate | ? | same-type | absolute channel | ? |
Human Activity Recognition from Continuous Ambient Sensor Data | Various "smart home" sensors | https://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+from+Continuous+Ambient+Sensor+Data | see link | multivariate | ? | different-type | absolute channel | ? |
Air Quality Data Set | Various sensor data | https://archive.ics.uci.edu/ml/datasets/Air+Quality | https://www.sciencedirect.com/science/article/abs/pii/S0925400507007691 | multivariate | ? | different-type | absolute channel | ? |
Related to https://github.com/epodium/time_series_generator/issues/5#issuecomment-788773321
absolute time
and relative time
could probably be treated the same, by defining them as relative to an event external to the time series (e.g. the origin of the time axis, an event in another time series, an event internal to the time series, etc)local pattern
and global pattern
are arbitrary, has more to do with how a process is sampled. Probably a more workable paradigm is to have users define the size of certain events with respect to time as well as with respect to what is on the vertical axis.. This would also take care of being able to deal with events of a certain duration.numerical
, categorical
, etc: I believe this is referred to as 'scales' . Some other scales are ordinal
, nominal
, interval
, ratio
etc.channels
and n_ch
should be kept separate of the signal
|noise
definitions. I'd prefer to
signal1
signal2
noise1
definitions
, you could use names like random_walk
, gaussian
, etc, like we're doing now with signal_type
. Each of these would need to be shorthand for an implementation somewhere (Python or elsewhere), and would take its function parameters from the corresponding yaml definition
. This would mean that each of the clauses here https://github.com/epodium/time_series_generator/blob/63923207204bbb09e04ea01d0f6ccf5f7a022842/ts_generator/TS_generator.py#L268-L332 and here https://github.com/epodium/time_series_generator/blob/63923207204bbb09e04ea01d0f6ccf5f7a022842/ts_generator/TS_generator.py#L335-L362 would become an individual function whose parameters are passed by kwargs
taken from the yaml. channels: [1, 2, 3]
. Perhaps this section of the yaml could be named composition
. Or just channels
.signal_def
and noise_def
can be merged into definitions
. We could optionally introduce a key stochastic: bool
for each definition if we need to differentiate between these 2 types of model, not sure yet.
Times series are everywhere and they can include a lot of different things. To better address this field and communicate our work, it is important to structure this a bit.
This is also to look at actual time series data and check what could be relevant. Possible resources are: