ekirving / qpbrute

Heuristic search algorithm for fitting qpGraph models
MIT License
9 stars 3 forks source link

error calculating Bayes factors #2

Closed angelesdecara closed 5 years ago

angelesdecara commented 5 years ago

sim1.log Dear Evan,

Sorry to bother you again. After running

python qpbrute.py \ --par test/sim1.par \ --prefix sim1 \ --pops A B C X \ --out Out

I get the log file attached (which doesn't seem very successful). I then try the Bayes factors and get the sim1.bayes.log attached. I suppose the Bayes factors issue is a consequence of the problems with running the graphs, but I am not absolutely sure. One side question: what format is the input geno file?

Many thanks for all your help!

sim1.bayes.log

ekirving commented 5 years ago

I get the log file attached (which doesn't seem very successful).

That is not the expected result for the test dataset. It should find exactly one model which fits all the populations. Your log file is reporting that it found 1 outlier for each tested model, but that the worst Z-score for each was well below the normal threshold, which is very odd. There might be an issue parsing the output from qpGraph, as my version (6040) is a little older than yours.

Can you please upload the file graphs/sim1-dfafe63.log so I can compare the output with my own?

I suppose the Bayes factors issue is a consequence of the problems with running the graphs, but I am not absolutely sure.

Yes, this is correct. If qpbrute.py cannot find any graphs that fit the data then qpbayes.py has no graphs to compute Bayes factors for.

One side question: what format is the input geno file?

The input files test/{sim1.ind, sim1.snp, sim1.geno} are in EIGENSTRAT format, which is the default format used by qpGraph. This simulated dataset is the same as one that ships with AdmixTools. To run qpbrute.py and qpbayes.py on your own data, you will need to convert to EIGENSTRAT format using convertf from AdmixTools, and make your own parameter file for qpGraph.

angelesdecara commented 5 years ago

Here's the log requested sim1-dfafe63.log

I asked you about the geno file as it's a binary file in my Mac. I'm converting my data to eigenstrat format using convertf, but I'll wait to sort this issue before running qpbrute on my data.

Thanks!

ekirving commented 5 years ago

In verion 6412 of qpGraph they added an extra header row to the output, confusing the parser in qpbrute.py which counted this line as an outlier in the model. I've patched the code and pushed the change to github.

angelesdecara commented 5 years ago

Thanks! Not quite there yet though... After many calculations, it crashes with this error:

Traceback (most recent call last): File "qpbrute.py", line 577, in permute_qpgraph(argv.par, argv.prefix, argv.pops, argv.out, nthreads=argv.threads) File "qpbrute.py", line 538, in permute_qpgraph pq.find_graph() File "qpbrute.py", line 498, in find_graph self.recurse_tree(root_tree, self.nodes[1], self.nodes[2:]) File "qpbrute.py", line 104, in recurse_tree node_placed = self.check_results(results, remaining, depth) File "qpbrute.py", line 196, in check_results self.recurse_tree(new_tree, remaining[0], remaining[1:], depth + 1) File "qpbrute.py", line 104, in recurse_tree node_placed = self.check_results(results, remaining, depth) File "qpbrute.py", line 196, in check_results self.recurse_tree(new_tree, remaining[0], remaining[1:], depth + 1) File "qpbrute.py", line 140, in recurse_tree results = self.test_trees(admix_trees, depth) File "qpbrute.py", line 169, in test_trees results = pool.map(self.run_qpgraph, itertools.izip(new_trees, itertools.repeat(depth))) File "/Users/angeles/anaconda2/lib/python2.7/site-packages/pathos/multiprocessing.py", line 137, in map return _pool.map(star(f), zip(*args)) # chunksize File "/Users/angeles/anaconda2/lib/python2.7/site-packages/multiprocess/pool.py", line 253, in map return self.map_async(func, iterable, chunksize).get() File "/Users/angeles/anaconda2/lib/python2.7/site-packages/multiprocess/pool.py", line 572, in get raise self._value ImportError: No module named graphviz

I've got graphviz installed (and have just reinstalled it with brew just in case), so that shouldn't be the source of error.

Many thanks again!

sim1.log

angelesdecara commented 5 years ago

And this is the Traceback running with --threads 1

Traceback (most recent call last): File "qpbrute.py", line 577, in permute_qpgraph(argv.par, argv.prefix, argv.pops, argv.out, nthreads=argv.threads) File "qpbrute.py", line 538, in permute_qpgraph pq.find_graph() File "qpbrute.py", line 498, in find_graph self.recurse_tree(root_tree, self.nodes[1], self.nodes[2:]) File "qpbrute.py", line 104, in recurse_tree node_placed = self.check_results(results, remaining, depth) File "qpbrute.py", line 196, in check_results self.recurse_tree(new_tree, remaining[0], remaining[1:], depth + 1) File "qpbrute.py", line 104, in recurse_tree node_placed = self.check_results(results, remaining, depth) File "qpbrute.py", line 196, in check_results self.recurse_tree(new_tree, remaining[0], remaining[1:], depth + 1) File "qpbrute.py", line 140, in recurse_tree results = self.test_trees(admix_trees, depth) File "qpbrute.py", line 174, in test_trees result = self.run_qpgraph((new_tree, depth)) File "qpbrute.py", line 310, in run_qpgraph pprint_qpgraph(dot_file, pdf_file) File "/Users/angeles/Downloads/qpbrute/utils.py", line 63, in pprint_qpgraph import graphviz ImportError: No module named graphviz

ekirving commented 5 years ago

That error indicates that the python package graphviz is not installed.

pip install graphviz

angelesdecara commented 5 years ago

It works!

FINISHED: Found 1 unique solution(s) from a total of 196 unique graphs!

Is that the expected result?

ekirving commented 5 years ago

Yes, that is the expected result. That test dataset was simulated to have exactly one graph solution.

If you run qpbayes.py it will generate the Bayes factor for that single model. In cases where there are multiple fitting models, qpbayes.py will compute Bayes factors for all of them, and tell you which one fits best.

angelesdecara commented 5 years ago

qpbayes.py is taking hours to compute the Bayes factor for that one single graph. That's worrying thinking of my dataset... Many thanks!

ekirving commented 5 years ago

Yes, computing the Bayes factors will be very slow as they require long chain lengths to converge.

Good luck!