Open Mr-Wu-H opened 2 years ago
Hello,
in their paper the authors just write Next, we calculate the standard deviation value of these adjacent discrepancies. At last, we can separate the data into subclasses by splitting at the sequence that has difference larger than half of the computed standard deviation.
.
Unfortunately, I have no justified argument for you on how to best choose a value for std_split
. The obvious thing is, with higher values you sample fewer time series as you pack more and more distinct time series into the same subclass.
Maybe visualizing the adjacent discrepancies along with different split values might give you an indication.
Thank you very much for your reply.
------------------ 原始邮件 ------------------ 发件人: "benibaeumle/FSS-Algorithm" @.>; 发送时间: 2022年4月13日(星期三) 下午5:32 @.>; @.**@.>; 主题: Re: [benibaeumle/FSS-Algorithm] Hello, could you tell me how to calculate this value, std_split ? (Issue #1)
Hello, in their paper the authors just write Next, we calculate the standard deviation value of these adjacent discrepancies. At last, we can separate the data into subclasses by splitting at the sequence that has difference larger than half of the computed standard deviation.. Unfortunately, I have no justified argument for you on how to best choose a value for std_split. The obvious thing is, with higher values you sample fewer time series as you pack more and more distinct time series into the same subclass. Maybe visualizing the adjacent discrepancies along with different split values might give you an indication.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
Hello, in their paper the authors just write
Next, we calculate the standard deviation value of these adjacent discrepancies. At last, we can separate the data into subclasses by splitting at the sequence that has difference larger than half of the computed standard deviation.
. Unfortunately, I have no justified argument for you on how to best choose a value forstd_split
. The obvious thing is, with higher values you sample fewer time series as you pack more and more distinct time series into the same subclass. Maybe visualizing the adjacent discrepancies along with different split values might give you an indication.
Hello, this sentence“we can separate the data into subclasses by splitting at the sequence that has difference larger than half of the computed standard deviation” means that remove the part greater than standard deviation and the rest time series as the sample time series.Am I right?
Hello, this method FastShapeletCandidates is to get shapelet candidates of one class, right?
Hello, in their paper the authors just write
Next, we calculate the standard deviation value of these adjacent discrepancies. At last, we can separate the data into subclasses by splitting at the sequence that has difference larger than half of the computed standard deviation.
. Unfortunately, I have no justified argument for you on how to best choose a value forstd_split
. The obvious thing is, with higher values you sample fewer time series as you pack more and more distinct time series into the same subclass. Maybe visualizing the adjacent discrepancies along with different split values might give you an indication.Hello, this sentence“we can separate the data into subclasses by splitting at the sequence that has difference larger than half of the computed standard deviation” means that remove the part greater than standard deviation and the rest time series as the sample time series.Am I right?
I am not sure if I understand you correctly. Please, see the paper chapter 3.1 for how this particular step is computed (I do not have Latex support when answering here, so having a look on the paper should be more comfortable for you). But in words, what is computed is:
Calculate the sum of the time steps of each time series
The result after computing the 8 steps above is the set of subclasses.
Hello, this method FastShapeletCandidates is to get shapelet candidates of one class, right?
Yes.
Thanks a lot.
Hello, in their paper the authors just write
Next, we calculate the standard deviation value of these adjacent discrepancies. At last, we can separate the data into subclasses by splitting at the sequence that has difference larger than half of the computed standard deviation.
. Unfortunately, I have no justified argument for you on how to best choose a value forstd_split
. The obvious thing is, with higher values you sample fewer time series as you pack more and more distinct time series into the same subclass. Maybe visualizing the adjacent discrepancies along with different split values might give you an indication.Hello, this sentence“we can separate the data into subclasses by splitting at the sequence that has difference larger than half of the computed standard deviation” means that remove the part greater than standard deviation and the rest time series as the sample time series.Am I right?
I am not sure if I understand you correctly. Please, see the paper chapter 3.1 for how this particular step is computed (I do not have Latex support when answering here, so having a look on the paper should be more comfortable for you). But in words, what is computed is:
- Calculate the sum of the time steps of each time series
- Calculate the mean over the sums
- Select the time series which is closest to the mean over the sums
- Calculate the euclidean distances of each time series to the time series we selected in 3. and sort the resulting list of distances
- Calculate the standard deviation of the differences between each pair of neighboring distances
- Now, for each pair in the sorted list of distances we check if the difference is larger than 1.5x the standard deviation we calculated in 5.
- If the standard deviation is larger than 1.5 we consider the time series between the last split point and the current split point as a subclass.
- Repeat 7 until we iterated over all neighboring distance pairs
The result after computing the 8 steps above is the set of subclasses.
Hello,how should I understand the last split point and the current split point in step 7?
See here.
std_split : float the standard deviation from the mean to subdivide the time series of a class into subclasses.