In input_pipe.py, there are parameters like "train_completeness_threshold" which determines how many 0's are allowed. It looks like the default is 1 for this value. Further down in the code, there is:
self.max_train_empty = int(round(train_window * (1 - train_completeness_threshold)))
So with the default value of 1, this makes max_train_empty default to 0, i.e. the randomly cropped time series must be completely filled [no missing values] in order to be used in training.
So is this what you did to get your best results, you discarded any time series crop which had holes in it?
Of the ~145 thousand time series in train_1.csv, it looks like about 2/3 of them are dense [no missing values], and any random crop of a dense series will remain dense, and a random crop of a series with holes may get a portion that is dense, so I guess even with the max_train_empty = 0 you still get to use most of the data, right?
Final version has train_completeness_threshold=0.01, it filters out only almost empty series. I found that spareness has regularizing effect for training, so there is no reason to filter out sparse series.
Hi, question about how you dealt with sparsity.
In input_pipe.py, there are parameters like "train_completeness_threshold" which determines how many 0's are allowed. It looks like the default is 1 for this value. Further down in the code, there is:
self.max_train_empty = int(round(train_window * (1 - train_completeness_threshold)))
So with the default value of 1, this makesmax_train_empty
default to 0, i.e. the randomly cropped time series must be completely filled [no missing values] in order to be used in training.So is this what you did to get your best results, you discarded any time series crop which had holes in it?
Of the ~145 thousand time series in train_1.csv, it looks like about 2/3 of them are dense [no missing values], and any random crop of a dense series will remain dense, and a random crop of a series with holes may get a portion that is dense, so I guess even with the max_train_empty = 0 you still get to use most of the data, right?