Open seanlaw opened 4 years ago
@seanlaw:
TS: Time Series
Could you please elaborate more on the purpose of ML model?
Let's say you have a 365day-by-24hours data, and you reshape it to a long time series with length. If you think about clustering those 365 days, I don't think matrix profile can help with that. If you think about forecasting, a matrix profile MIGHT be helpful. For instance, to predict TS[h], it can find similar motifs to TS[(h-L):h], where L is user-defined and can be tuned. Then, it MIGHT help to find more accurate forecast. (however, there are some other challenges that we can discuss further if you are interested)
Could you please elaborate more on the purpose of ML model?
So, I'm not a big fan of unsupervised methods and so this would be targeted for supervised learning approaches where you have labels. Let me provide a completely fictitious and hypothetical example:
Imagine if you have 1,000 engineering students in college and they gave you access to their GPS positional data from their cell phones (let's simplify and only worry about the latitude). During their 4 years there, they walk across campus every day and they attend classes (the same classes every semester). We also know what their "major" is (i.e., which engineering degree they are going to graduate with) or this is the supervised "label".
So, assuming that the location of the classes never change, given the GPS coordinates of another student not in this set, do you think you can predict which "major" this student is in after 2-3 years of data?
In this case, I would take the data from the 1,000 students and try to identify common motifs amongst the students (I believe that some may also refer to these as "shapelets" in a specific context). Then, I could/would try to use the presence/absence of those motifs (i.e., one-hot encoding) in training an ML model to predict their "major".
Let me know if that example makes sense? Again, I think it's important for me to point out that "Github issues" are always open for debate and maybe you can convince me that this is a terrible idea! I would be happy to close this issue if it is not useful or feasible. 😄 There is no ego here. 👍
@seanlaw It sounds interesting. I will think about it for sure.
The following paper shows how to create a feature matrix for classification by using pairwise distances. https://www.researchgate.net/profile/Rohit-Kate/publication/276422351_Using_dynamic_time_warping_distances_as_features_for_improved_time_series_classification/links/5c0ed52892851c39ebe437b5/Using-dynamic-time-warping-distances-as-features-for-improved-time-series-classification.pdf
Assuming that you've identified some interesting motifs across multiple time series, one could potentially represent those motifs in the form of one-hot (or multi-hot) encoding and use them to train a machine learning model.