caitiecollins / treeWAS

treeWAS: A Phylogenetic Tree-Based Tool for Genome-Wide Association Studies in Microbes
Other
92 stars 18 forks source link

Error in inputting snps ASR #64

Closed moorembioinfo closed 1 year ago

moorembioinfo commented 1 year ago

Hi Caitlin!

I've run into an error when providing my own ancestral states to treeWAS. The error states that my snps matrix and ASR matrix have a different number of columns. I've tried to see if I've done anything wrong but the two matrices genomat and ancmat are generated from the same matrix* and have the same matrix column dimensions and names:

> dim(ancmat)
[1] 141 916
> dim(genomat)
[1]  71 916
> all(colnames(ancmat) %in% colnames(genomat))
[1] TRUE

*They're loaded into simply with:

genodf = read.csv("tip_binary.csv", row.names=1)
genomat = as.matrix(genodf)
ancdf = read.csv("binary_genotypes.csv", row.names=1)
ancmat = as.matrix(ancdf)

Where tip_binary.csv is a direct subset (tips only) of binary_genotypes.csv which contains the states for tips and all internal nodes

The error message (2):

Warning messages:
1: In treeWAS(snps = genomat, phen = phen, tree = tree2, snps.reconstruction = ancmat,  :
  Careful-- snps and snps.rec should have the same index when reduced
                    to their unique forms.

2: In treeWAS(snps = genomat, phen = phen, tree = tree2, snps.reconstruction = ancmat,  :
  The number of columns in the provided snps.reconstruction is not equal to the number of
                    columns in the snps matrix. Performing a new parsimonious reconstruction instead.

Thanks in advance for any help with this. All the best, Matt


Update: Within treeWAS the matrices end up being:

> ncol(snps.reconstruction)
[1] 419
> ncol(snps)
[1] 416

Presumably the get.unique.matrix() won't result in the same number of columns for the tips only versus tips+internal nodes matrix?

caitiecollins commented 1 year ago

Hello,

Would be possible for you to share your data with me? It would speed up the investigation process on my end. If you can, please email your snps, phen, tree, snps.reconstruction to me at caitiecollins@gmail.com.

If not, please let me know, and I will look into this for you as best I can.

Thank you. Best, Caitlin.

caitiecollins commented 1 year ago

Hi Matt,

I've pushed a change that will now allow you to proceed with your inputted snps.reconstruction. You just need to re-download and install the treeWAS package.

TreeWAS should now run without stopping, but it will still give you a warning, because it typically should not be the case that your snps.rec and snps reduce to a different number/set of unique columns.

Either way, with your ancmat reconstruction, treeWAS does find one significant site via the simultaneous test. If you set phen.reconstruction = “ml”, it finds 8 significant sites. Though none of these are found if you replace ancmat with a new reconstruction.

I’ll send you some plots via email with some more comments about your particular reconstruction.

Best, Caitlin.

caitiecollins commented 1 year ago

Unrelated, but I just wanted to note that treeWAS has a new “correct.prop” argument that corrects treeWAS tests 1 and 3 for the relative proportions of individuals in each phenotypic class. Your phen is not too imbalanced, but the shape of your Score 3 null distribution suggests it might be best to set correct.prop=TRUE.