dkesada / dbnR

Gaussian dynamic Bayesian networks structure learning and inference based on the bnlearn package
GNU General Public License v3.0
44 stars 10 forks source link

What to do when we have data for t0,t1,t2...tn and want to predict t(n+1) #4

Closed shubhrampandey closed 4 years ago

shubhrampandey commented 4 years ago

Hi,

Please let me know if I can use this package when we have the data for t0,t1,t2....tn and want to predict for tn+1..... If yes HOW?

Thanks a lot!!!

dkesada commented 4 years ago

Hello, when you learn a DBN model from data you have to input a 'size' parameter that determines how many past time instants tn,tn-1,tn-2,...,tn-size you are going to use to predict tn+1. By default, DBN models use tn and tn-1 to predict tn+1, but you can increase that. Keep in mind that the bigger the 'size' parameter is, the bigger the resulting network and the slower the learning of the structure. Once you learn the network for a given size, you can use it to forecast tn+1,tn+2 and so forth.

Reasonable size values usually range from 2 to 6, mainly because it is not very common to have time series data with an autorregresive order higher than 6, except in some areas like finances. If you have, say, n = 1000, you shouldn't expect the 1001 value to be dependent on all of the previous values. Maybe you can have an accurate forecasting using only the 997, 998 and 999 instances. You have to assume that at some point, the instances become independent from one another given enough time in between.

I hope I could be of help. Cheers!

shubhrampandey commented 4 years ago

Hi, Thanks for the prompt & detailed response. Actually, I was asking can I use your package to train BN model for my data where I have columns of features (X1,X2,X3...Xn) at 5 different time points. For eg: X1t0...Xnt0...X2t0....X2t0....X5t0...X5tn And after that, I want a prediction for X1t6...Xnt6. I hope that I have clearly stated my problem. Please let me know if you have any suggestion on this.

Thanks a lot

dkesada commented 4 years ago

If I understand correctly, the problem is that you already have your features divided by time, in the fashion X1_t0, X1_t1, ..., X1_tn, X2_t0, X2_t1, ..., X2_tn, ..., X5_t0, X5_t1, ..., X5_tn and you want to predict t6 for all the variables. The only issue is that my package already handles the time separation of the features internally to properly format the dataset to learn and predict with the DBN.

In your case, you would have to take the columns [X1_t0, X2_t0, X3_t0, X4_t0, X5_t0] and create a new dataset with only those. You only need 1 column for each variable. Then, learn a DBN from that dataset (folding it with the 'fold_dt' function) and then forecast with the parameters ini = 5 & len = 1, to predict all the variables in t6.

I hope my answer was more on point this time.

shubhrampandey commented 4 years ago

Thank you so much for your reply....I am working on that but I need your guidance on this....Please find attached a sample file for the data. Please have a look and suggest me if you can. Thanks. x.xlsx

shubhrampandey commented 4 years ago

Also, I think that your package does not work on mixed dataset like I have both (Continuous and categorical) in my dataset.

dkesada commented 4 years ago

Yeah, my package only supports continuous datasets, sorry

shubhrampandey commented 4 years ago

No worries...Thats not an issue....Do you have any idea how we can do this with the help of bnlearn

dkesada commented 4 years ago

If you want to use bnlearn, you would have to either discretize your variables or train a mixed net, which I think it's possible now in bnlearn but I have never done it. If you already have your variables divided by time, the fastest solution would be to treat it all as an unrolled dbn and make inference as if it were a normal bn. Those two options would probably be the most cost-efficient solutions if you are pressed by time.