dkesada / dbnR

Gaussian dynamic Bayesian networks structure learning and inference based on the bnlearn package
GNU General Public License v3.0
44 stars 10 forks source link

Psoho and natPsoho, problem in R #20

Closed IonasPi closed 1 year ago

IonasPi commented 1 year ago

Hi. I have a problem with the algorithms psoho y natpsoho. Once i use them Rstudio says "R Session Aborted: R encountered a fatal error. The session was terminated". It happens even with the simpliest case: data(motor) size <- 3 dt_train <- motor[200:2500] dt_val <- motor[2501:3000] net <- learn_dbn_struc(dt_train, size, method = "psoho")

It is not a memory problem; and when I use method= dmmhc the function works well.

A week ago it worked fine, but now it happens in two different computers with motor data and with my data. I have updated r to version 4.2.2 and rstudio to the last version to try to solve it, but it is still failing. I do not know if this error is replicable or just happening to me.

Thanks

dkesada commented 1 year ago

Hi! Thank you for the heads up. You are absolutely right, something underneath has broken. I do not know why now though, since I haven't touched the psoho or the natPsoho algorithms in a while. I'll look into it and get back to you when I depurate the issue.

dkesada commented 1 year ago

Apparently it's some kind of issue with a call to std::regex_match() down in C++ inside the rename_slices() function. I haven't touched this function since I created it, so it has to be some kind of incompatibility introduced by either R or Rcpp. Strangely enough, this fragment of code works on a native C++ compiler, but it does not when called from R. I'll try to find a solution for this, and if I cannot I will move this function back to native R. It will be slower, but at least we won't get cryptic aborted R sessions.

dkesada commented 1 year ago

Minimal complete and verifiable example:

library(Rcpp)

src <-
  '
  #include <regex>
  int tmp_regex(){
    std::string new_name = "ambient_t_0";
    std::smatch m;
    std::regex re("^(.+_t_)([0-9]+)$");
    std::regex_match(new_name, m, re);
    return 0;
  }
  '
Rcpp::cppFunction(src)
tmp_regex()

From what I'm reading, regex depends on C++11, and this has known issues in Windows. What I did not expect was for it to have such unstable behaviour. The above code explodes because of the second part of the regex pattern ([0-9]+), the rest of the patter works fine. I'll follow Eddelbuettel's advice and switch to R's regex motor.

dkesada commented 1 year ago

I've fixed both the psoho and the natPsoho algorithms on the master and devel branches (475e7ca). Updating the package with devtools via install_github("dkesada/dbnR") should fix it on your end too.

IonasPi commented 1 year ago

Hi, I have followed your instructions, but it is not working. For a while even R did not work at all. I had to uninstall R studio. I think the problem was with Rtools. I will try tomorrow again. Thanks

IonasPi commented 1 year ago

Now, I also has some problems with dmmhc, it does not work any more, even with simple models. Is just me? Any other with the same problem? I will try to reinstall Rtools and devtools. Thanks

IonasPi commented 1 year ago

OK, now is working, but only psoho y natpsoho, not dmmhc. Hmmm...

IonasPi commented 1 year ago

Ok, dmmhc works but much much slower, exponentially slower, if I can say it like this. This is a problem too.. Do you plan to improve it in some way? thanks

dkesada commented 1 year ago

Do you mean slower than before or slower than the other algorithms? I do not see any difference with the performance before, and there is not much I can do to speed up the performance of the dmmhc algorithm. The dmmhc algorithm is slower by definition for higher Markovian orders and bigger networks than the other two algorithms, and the only improvement I could accomplish would be if I code it all myself on C++, which would take a lot of time that I'm afraid I do not have at the moment.

IonasPi commented 1 year ago

I mean slower than before. Previoualy with size 3, 75 variables (columns) and 1040 registres (rows) dmmhc was around 45-60 secs, now I do not know exactly but probably more than 10 minutes. I can not image for size 6, previously took 14h, now I do not know (7-20d). When I saw that in three or four min It didn't do it I interrupted It because I thought it has some problem. Then I checked that with motor data It worked, so I thought it is a problem related to C++. Dmmhc has great characteristics in dbnR. It would be great if we could recover the previous version. Maybe I can use the previous version calling to install.packages (dbnR) library (dbnR) and uninstalling the GitHub version, and I can use one versión or the other depending if I want to use dmmhc or psoho/natPsoho. These last algorithms are really great too and I am using them