ekirving / qpbrute

Heuristic search algorithm for fitting qpGraph models
MIT License
9 stars 3 forks source link

qpBayes crashing - long vectors not supported yet (?) #19

Closed dbssyck closed 1 year ago

dbssyck commented 2 years ago

Hi Evan,

I'm running qpBayes by extending the number of iterations (so, running qpBayes again after deleting all files in bayes folder except the chain.csv files) but am experiencing a crash with the following error

Traceback (most recent call last):
  File "/home/svu/dbssyck/conda_envs/qpbrute/bin/qpBayes", line 8, in <module>
    sys.exit(qpbayes())
  File "/home/svu/dbssyck/conda_envs/qpbrute/lib/python3.8/site-packages/qpbrute/qpbayes.py", line 372, in qpbayes
    calculate_bayes_factors(
  File "/home/svu/dbssyck/conda_envs/qpbrute/lib/python3.8/site-packages/qpbrute/qpbayes.py", line 294, in calculate_bayes_factors
    qpb.find_best_model()
  File "/home/svu/dbssyck/conda_envs/qpbrute/lib/python3.8/site-packages/qpbrute/qpbayes.py", line 236, in find_best_model
    run_cmd(
  File "/home/svu/dbssyck/conda_envs/qpbrute/lib/python3.8/site-packages/qpbrute/utils.py", line 67, in run_cmd
    raise RuntimeError(f"ERROR: '{err}'; RETCODE:{proc.returncode}\n" + " ".join(cmd))
RuntimeError: ERROR: 'b'Error in var(if (is.vector(x) || is.factor(x)) x else as.double(x), na.rm = na.rm) : \n  long vectors not supported yet: ../../../include/Rinlinedfuns.h:522\nCalls: model_likelihood_n -> cbind -> <Anonymous> -> var\nExecution halted\n''; RETCODE:1
Rscript /home/svu/dbssyck/conda_envs/qpbrute/lib/python3.8/site-packages/qpbrute/rscript/bayes_factors.R angsd_1_sulawesi_r0.3 2800000

bayes.log shows that Loading chain for all the graphs are complete. and ends at the following steps

Merging replicate chains. Computing the likelihoods for all models.

In the bayes folder, there are no likelihoods.csv files, so something might have went wrong around https://github.com/ekirving/qpbrute/blob/master/qpbrute/rscript/bayes_factors.R#:~:text=cat(%22Computing,likelihoods.csv%22)) ?

Have been struggling to troubleshoot the problem and would appreciate your assistance.

Thank you!

ekirving commented 2 years ago

I've not seen this error before, but some googling suggests that it happens when the size of a matrix exceeds the maximum size in R. You should be able to confirm if this is the case by increasing the size of the burn-in, which will reduce the total size of the combined chains.

dbssyck commented 2 years ago

Hi Evan,

Thanks, and indeed it seems to be a result of the chain size. I'm guessing this would this mean that there's going to be an upper limit to the number of iterations that can be conducted when running qpbayes?

ekirving commented 2 years ago

The google results suggest that there are potential work arounds for dealing with very massive matrices, but without having access to a reproducible example, it's hard for me to debug the issue.

My guess here is that the issue is being caused by the combination of long chain lengths and multiple replicates, and that the result is exceeding the native size limit of a matrix in R.

There are two things you could do here to avoid the issue: (1) increase the burn in; or (2) decrease the replicates.

An alternative solution would be to change the code to thin the chains. This used to be a feature, but I removed it as it is not analytically necessary, and I never personally encountered your issue (see https://github.com/ekirving/qpbrute/blob/master/qpbrute/rscript/model_likelihood.R#L222). The commit that removed thinning is here https://github.com/ekirving/qpbrute/commit/2e0c9001d9e5677e1b6e5d6a88c2a05f10e77dcc

dbssyck commented 2 years ago

Thanks Evan, I'll try with those workarounds, some of the my just aren't converging. Does qpBrute have a function to calculate the multivariate version of the Gelman-Rubin metric you pointed out in this thread?

EDITED 2022-11-08: My bad, noticed that the multivariate version is reflected in likelihoods.log

Also, this is not the brightest of questions, but just to make sure I'm not misinterpreting anything regarding the Gelman-rubin graph: there are cases where the Shrink Factor y-axis on the graphs is extremely large, but with the graph reaching an asymptote at a low value I can't tell. The PSRFs for these particular parameters in likelihoods.log are 1/below 1.2, which suggests that the parameters should have converged. The Shrink Factor in the graphs and PSRF refer to the same thing right?