caitiecollins / treeWAS

treeWAS: A Phylogenetic Tree-Based Tool for Genome-Wide Association Studies in Microbes
Other
92 stars 18 forks source link

p.value question #49

Closed mrchase62 closed 4 years ago

mrchase62 commented 4 years ago

Hi Caitie, This may be more of a question rather than an issue. I am trying to alter the threshold set by 'p.value' to look further down a list of significant snps in a dataset with treeWAS. I am seeing the top hits I expect for drug resistant strains of Mycobacterium tuberculosis and would like to look further down the rabbit hole. No matter what I set it to ( for example p.value = 0.05 or even 0.1) the returned results only appear to be for the default ( p.value = 0.01). I am doing something wrong or is there another way to go about this. I am ultimately trying to get access to more of the output. Kind regards, Michael

caitiecollins commented 4 years ago

Hi Michael,

It's possible that the same set of sites is exceeding the significance threshold, whether you're setting it to 0.01 or 0.1. By the sound of it, you may be looking at the results summary, which would only show you the top hits. But, if you wanted to explore the data further, you can see the results in detail for each site by examining the output object itself.

## run treeWAS, assigning the output to object 'out':
out <- treeWAS(snps=snps, phen=phen, tree=tree, seed = 1)

## print summary of results:
print(out)

## examine structure of output object:
str(out)

The output contains, among other things, the p-values and the association score values for each genetic locus in the dataset (in their original order). You can explore these values for each of the three association scores that treeWAS calculates.

For example, if you wanted to see the p-values for Score 2 that were below 0.01 (or whatever value you were interested in):

## get score 2 p-values:
pv2 <- out$simultaneous$p.vals

## see values below 0.01:
pv2[which(pv2 < 0.01)]

Or, if you wanted to see which sites had a Score 1 value above a certain level (e.g., if you had seen that a couple points exceeded 0.7 on the Manhattan plot and you wanted to see which ones they were):

## get score 1 association score values:
score1 <- out$terminal$corr.dat

## see values above 0.7:
score1[which(score1 > 0.7)]

Is that the kind of thing you were trying to do? If not, let me know and I’ll try and help you find what you’re looking for.

If you’re trying to comb through the output, you may find the GitHub treeWAS Wiki useful as a guide to help you decrypt all that information: https://github.com/caitiecollins/treeWAS/wiki/4.-Interpreting-Output.

Thanks for your question.

All the best, Caitlin.

mrchase62 commented 4 years ago

Hi Caitlin, Thanks for answering the question. I can now find the data I was looking for from the analysis. Kind regards, Michael