Closed prashasti19075 closed 11 months ago
Hi! I've looked into the pull request a bit, but at the moment I'm so busy with work that I forgot to get back to you. I have some comments on it:
The idea of bootstrapping in BNs is well stablished and extended in the community, but in the case of DBNs it has some underlying issues and is not so straightforward to apply it. When we perform bootstrapping of a dataset, we create several samplings with replacement of this dataset, and one of the secondary effects that this has is that the instances in the dataset get randomly rearranged. This is an issue with time series datasets where the position of each instance inside the dataset is of relevance. In the dmmhc algorithm there are two separate networks learned consecutively: the static BN for each of the timeslices and the transition network with inter-slice arcs. In the case of the static BN, it shouldn't be an issue, but the DBN structure requires this specific ordering of the instances, and directly applying the bootstrapping method from bnlearn can lead to some unforeseen issues. This point requires further study, because in your code you are bootstrapping a folded dataset, which could be mitigating the issue, but I am not sure that this does not have unforeseen side effects.
In your pull request, you set the default behaviour of the dmmhc algorithm to perform bootstrapping. This should not be the case, the base behaviour should be that of the original dmmhc algorithm. Bootstrapping should be set as a parameter by the user when they know that they want to learn the DBN structure that way. This is simple to fix by setting the parameter 'nboots' to -1 by default, but then the user does not have a default number of resamplings, which can be problematic.
I'm not sure that the solution of performing bootstrapping with the default 2 resamplings is any better that the default behaviour of learning a single DBN. The default behaviour in bnlearn is to perform 200 resamplings, but I can tell that that is going to be too time consuming when learning DBNs. I do not know which number of resamplings is a good average starting point, but some indication should be given to the user, so that they do not set a very high number of resamplings and the algorithms never ends computing the network.
On the topic of parameters, there are a lot of them that can be given to bnlearn to tune the bootstrapping process, but only the number of resamplings and the checks to direct undirected arcs on >0.51 are present. This would probably need some time to evaluate whether all the parameters should be present or not, or even if implementing a bootstrapping native to dbnR would avoid complications in the long run.
The use of the 'cluster' parameter adds a hidden dependency to the package parallel, which should be listed in dbnR's suggested package dependencies. I am not sure if this dependency is worth having, because I have not tested the performance of both using and not using the 'cluster' parameter.
The dmmhc algorithm can be very time consuming, and bootstrapping it multiplies its execution time by as many resamplings as we want to perform. This is something the user should expect when performing bootstrapping, but if its going to take a very long time, we should be giving the user some feedback via console outputs about the number of DBNs structures that have been already computed and how many of them are left. This would be the same as with the psoho and natpsoho algorithms, which have a completion bar showing what's left of the process, so that the user is not left with a blank console without an idea of when is the execution going to finish. This is not a simple task because the bootstrapping is completely encapsulated inside bnlearn.
All in all, the use of bootstrapping for the dmmhc algorithm is an interesting proposal, but it can end up having more complications that it's worth it as a general purpose method. Maybe it would be ok to have it available in case the user knows what they are doing, but at the moment I do not have the time to further look into it. In case you need this functionallity for some of your research, you can always point users to your forked version of dbnR that has the bootstrapped dmmhc method.
Applied Bootstrapping Method to the dmmhc() algorithm by creating a bootstrapped net instead of One Time. Several checks can be added for this case to ensure smooth functioning, and we can work on that. We need a review of the bootstrapping implementation first. Please let me know in case you need additional clarification.