Timeseries with missing data

Hello, I hope you are well.

First of all, amazing work. I particularly enjoyed reading the paper. It was very well-written and easily understandable.

I'm working with an open source internet activity data set. It's a fairly small data set, with only hourly recordings over 5 weeks. From the way I understood the data formatting, I used the 'week of the year' as an attribute and the actual measurements as the feature. The results were pretty impressive and I've attached a simple comparison plot of the real (orange) and generated (blue) sequences below. intactivity_dpg4259

My data format looked something like this:	Week of year	0	1	2	3	.....
0	123	456	678	567	.....	234
1	345	890	787	122	.....	345
...	.....	.....	....	.....	.....	....

For now, there are 168 hours in each week, so the series length is constant and active on every step. This made for fairly simple data pre-processing. Now suppose I randomly removed some hourly values from each week, and then trained the DG on that new, partial timeseries data.

Can the DG produce unique values for all hours that I could then use to fill in the gaps in the original input? Another way to phrase it would be, if each week had a different series length, can the DG produce the full 168 hours for each week based on the hourly values it gets?

If yes, then what would the Preprocessing of such a data set look like?

My idea is to add the hours as an additional feature, but I'm not sure if I would drop the hours with missing values and let DG pad the end of the timeseries as it does or something else. I'm also not sure how this would be reflected in the data_gen_flag. Would I just show the timeseries as 'off' for those values?

I hope I make sense. I'd love to hear your opinion about whether this sort of generation is possible and a rough idea of what time attributes/features should be included to improve its results. Thank you!

fjxmlzn / DoppelGANger

Timeseries with missing data #18