caitiecollins / treeWAS

treeWAS: A Phylogenetic Tree-Based Tool for Genome-Wide Association Studies in Microbes
Other
92 stars 18 forks source link

Tree Plot Error: Error in num2col(var, col.pal = seasun) : could not find function "num2col" #34

Closed WallyL closed 2 years ago

WallyL commented 5 years ago

I've been trying to get this to run for awhile with no success and wondering if someone could offer any help. I have created the snp matrix and a phen vector with continuous variables that have been ranked. I have tired the following basic calls and a few variations:

out <- treeWAS(snps = snps.m3.matrix, phen = phen.vector, phen.type = "continuous")
out <- treeWAS(snps = snps.m3.matrix, phen = phen.vector, snps.reconstruction = "parsimony", phen.reconstruction = "parsimony", phen.type = "continuous")  

Each time I get this error:

If tree is not a phylo object, please specify one of the following reconstruction methods: 'NJ', 'BIONJ', 'parsimony', 'NJ', 'BIONJ'. Choosing 'BIONJ' by default. Setting 10 negative branch lengths to zero. Reconstruction type must be 'ML' when variable is 'continuous'. Setting type to 'ML'. Error in num2col(var, col.pal = seasun) : could not find function "num2col"

I see from the code that this is coming from the in the plot tree sub routine, but don't know why it's being thrown or how to remedy. Thanks.

cizydorczyk commented 5 years ago

I think you need to specify a phylogenetic tree in the treeWAS function, which you appear to be missing in your call to the function. If you have already created a tree, you need to use the 'tree=' argument. Otherwise, the error is telling you that you need to specify a tree-building method and treeWAS will build the tree for you (e.g. 'tree="NJ"' for a neighbor joining tree).

Not sure about some of the other error messages though (i.e. "could not find function num2col").

Hope that helps.

WallyL commented 5 years ago

Thank you for the suggestion. I had actually tried that already a couple of ways as simplest cases:

out <- treeWAS(snps = snps.m3.matrix, phen = phen.vector, tree = "NJ", phen.type = "continuous")
out <- treeWAS(snps = snps.m3.matrix, phen = phen.vector, tree = "parsimony", phen.type = "continuous")

However, it seems that when using a continuous variable the reconstruction must be maximum likelihood so it automatically defaults to that and then I still get essentially the same errors, i.e. using tree = "NJ" I get:

Setting 15 negative branch lengths to zero. Reconstruction type must be 'ML' when variable is 'continuous'. Setting type to 'ML'. Error in num2col(var, col.pal = seasun) : could not find function "num2col"

and using "parsimony" (this takes several minutes, so I thought is was working) I get:

Reconstruction type must be 'ML' when variable is 'continuous'. Setting type to 'ML'. Error in num2col(var, col.pal = seasun) : could not find function "num2col"

caitiecollins commented 5 years ago

Hi Wally,

I'm sorry about the error you're running into and the ensuing irritation it has likely caused. It looks like an attempt to make a few formatting changes to some key package dependencies has had some unintended downstream effects. Thank you for bringing this to my attention. With any luck, I should be able to figure out what's been changed and what needs amending in relatively short order.

But, if I'm right in thinking this is what's caused your problem, it may be temporarily fixed by just hard loading the required/missing dependency package(s) in your current session, alongside treeWAS.

For a start, could you please try running: library(adegenet)

And then could I ask you to please let me know if, when you re-try running your code above, whether you get the same, different, or no error messages?

Thanks very much for passing on this info.

Best wishes, Caitlin.

PS---Many thanks for chiming in to help, Conrad. Much appreciated :)

WallyL commented 5 years ago

Thanks, Caitlin.

I had actually just taken a closer look at your code minutes ago and loaded the adegenet and ape libs and now it seems to be running happily...

Best, Walt

caitiecollins commented 5 years ago

Amazing! I'm glad that's working for you.

Sorry for the added headache. Once I've made the neccessary changes, you may want to check back and just reinstall the package to ensure that it works hitch-free in future.

I'll be here if you need anything. And please do let me know if you get any more unexpected errors. It's a big help.

All the best, Caitlin.

WallyL commented 5 years ago

Thanks, Caitlin. I do have a question regarding the results of my runs. My snp matrix has the following features:

dim(snps.m3.matrix)
[1]    85 62099
str(snps.m3.matrix)
 int [1:85, 1:62099] 0 1 0 0 0 NA 1 0 0 NA ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:85] "PANS_1_1" "PANS_1_10" "PANS_1_2" "PANS_1_5" ...
  ..$ : chr [1:62099] "X43.12213.g" "X43.12214.a" "X43.12215.c" "X43.12220.t" ...

And my phen file is like this:

str(phen.vector)
 Named num [1:85] 56 56 22 2 4 56 56 56 29 56 ...
 - attr(*, "names")= chr [1:85] "PANS_1_1" "PANS_1_10" "PANS_1_2" "PANS_1_5" ...

So I've run with the tree flagged for either "NJ" or "parsimony".

out <- treeWAS(snps = snps.m3.matrix, phen = phen.vector, tree = "NJ", phen.type = "continuous")
out2 <- treeWAS(snps = snps.m3.matrix, phen = phen.vector, tree = "parsimony", phen.type = "continuous")

The phenotypic data were given to me as numerical data ranging from 0 to 4.51 and Xavier suggested that I use Rank to collapse to integers, which I did, but there are still 56 members/levels(?) in 85 samples, which I realize is a lot. (Note that these are not my data and I have very limited experience with GWAS analyses...)

In both cases, I get trees that are fairly similar. Unfortunately, also in both cases, there are no loci identified, i.e. here are the results for out and out2:

out
    #################### 
    ## treeWAS output ## 
    #################### 

    #################### 
    ## All findings:  ## 
    #################### 
Number of significant loci: [1] 0

    ######################## 
    ## Findings by test:  ## 
    ######################## 
     ####################  
     ##  terminal test ## 
     ####################  
Number of significant loci: 
[1] 0
Significance threshold: 
99.99999% 
0.4638574 
     ########################  
     ##  simultaneous test ## 
     ########################  
Number of significant loci: 
[1] 0
Significance threshold: 
99.99999% 
 2.587129 
     ######################  
     ##  subsequent test ## 
     ######################  
Number of significant loci: 
[1] 0
Significance threshold: 
99.99999% 
 79.62617 
> out2
    #################### 
    ## treeWAS output ## 
    #################### 

    #################### 
    ## All findings:  ## 
    #################### 
Number of significant loci: [1] 0

    ######################## 
    ## Findings by test:  ## 
    ######################## 
     ####################  
     ##  terminal test ## 
     ####################  
Number of significant loci: 
[1] 0
Significance threshold: 
99.99999% 
0.4743957 
     ########################  
     ##  simultaneous test ## 
     ########################  
Number of significant loci: 
[1] 0
Significance threshold: 
99.99999% 
 2.556948 
     ######################  
     ##  subsequent test ## 
     ######################  
Number of significant loci: 
[1] 0
Significance threshold: 
99.99999% 
 82.61206 

I guess I would expect to see something, but perhaps the large number of phenotypes relative to sample # could be leading to this result? Am I running the program correctly for this type of dataset and/or do you have any suggestions regarding any specific parameter flags that I might employ in order to tease something out of these data?

Thank you again for your help.

Best, Walt

caitiecollins commented 5 years ago

Hi Walt,

I'm happy to help wherever I can. Before I try to offer you advice though, it would be helpful if I could take a look at the plots output by treeWAS (particularly the trees and the null distributions). Might you be able to send me a pdf of those plots?

Also, if your data isn't confidential and you're comfortable sharing it with me, that would obviously give me the best insight, but that's totally up to you.

All the best, Caitlin. Department of Infectious Disease Epidemiology Imperial College London St. Mary's Hospital Praed Street, London, U.K. W2 1NY

On Mon, Apr 15, 2019 at 8:51 PM Walt Lorenz notifications@github.com wrote:

Thanks, Caitlin. I do have a question regarding the results of my runs. My snp matrix has the following features:

dim(snps.m3.matrix) [1] 85 62099 str(snps.m3.matrix) int [1:85, 1:62099] 0 1 0 0 0 NA 1 0 0 NA ...

  • attr(*, "dimnames")=List of 2 ..$ : chr [1:85] "PANS_1_1" "PANS_1_10" "PANS_1_2" "PANS_1_5" ... ..$ : chr [1:62099] "X43.12213.g" "X43.12214.a" "X43.12215.c" "X43.12220.t" ...

And my phen file is like this:

str(phen.vector) Named num [1:85] 56 56 22 2 4 56 56 56 29 56 ...

  • attr(*, "names")= chr [1:85] "PANS_1_1" "PANS_1_10" "PANS_1_2" "PANS_1_5" ...

So I've run with the tree flagged for either "NJ" or "parsimony".

out <- treeWAS(snps = snps.m3.matrix, phen = phen.vector, tree = "NJ", phen.type = "continuous") out2 <- treeWAS(snps = snps.m3.matrix, phen = phen.vector, tree = "parsimony", phen.type = "continuous")

The phenotypic data were given to me as numerical data ranging from 0 to 4.51 and Xavier suggested that I use Rank to collapse to integers, which I did, but there are still 56 members/levels(?) in 85 samples, which I realize is a lot. (Note that these are not my data and I have very limited experience with GWAS analyses...)

In both cases, I get trees that are fairly similar. Unfortunately, also in both cases, there are no loci identified, i.e. here are the results for out and out2:

out ####################

treeWAS output

####################

####################

All findings:

#################### Number of significant loci: [1] 0

########################

Findings by test:

######################## ####################

terminal test

#################### Number of significant loci: [1] 0 Significance threshold: 99.99999% 0.4638574 ########################

simultaneous test

######################## Number of significant loci: [1] 0 Significance threshold: 99.99999% 2.587129 ######################

subsequent test

###################### Number of significant loci: [1] 0 Significance threshold: 99.99999% 79.62617

out2 ####################

treeWAS output

####################

####################

All findings:

#################### Number of significant loci: [1] 0

########################

Findings by test:

######################## ####################

terminal test

#################### Number of significant loci: [1] 0 Significance threshold: 99.99999% 0.4743957 ########################

simultaneous test

######################## Number of significant loci: [1] 0 Significance threshold: 99.99999% 2.556948 ######################

subsequent test

###################### Number of significant loci: [1] 0 Significance threshold: 99.99999% 82.61206

I guess I would expect to see something, but perhaps the large number of phenotypes relative to sample # could be leading to this result? Am I running the program correctly for this type of dataset and/or do you have any suggestions regarding any specific parameter flags that I might employ in order to tease something out of these data?

Thank you again for your help.

Best, Walt

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/caitiecollins/treeWAS/issues/34#issuecomment-483393418, or mute the thread https://github.com/notifications/unsubscribe-auth/AHRqqxspVCpzpjgj8YLZgm4Ze57E6rwTks5vhNhDgaJpZM4cwMGX .

WallyL commented 5 years ago

Hi Caitlin,

Great! I'm happy to share what I have.

Both the snp matrix and phen.vector validate per tests shown in the manual. I really struggled to create the phen file as I was trying to replicate something like in the binary example file and I kept trying to convert using as.factor, but then finally realized that a vector would work ( I think!) for continuous phenotype data.

Here are the four runs tested thus far: out <- treeWAS(snps = snps.m3.matrix, phen = phen.vector, tree = "NJ", phen.type = "continuous") out2 <- treeWAS(snps = snps.m3.matrix, phen = phen.vector, tree = "parsimony", phen.type = "continuous") out3 <- treeWAS(snps = snps.m3.matrix, phen = phen.vector, tree = "parsimony", phen.type = "continuous", seed = 1) out4 <- treeWAS(snps = snps.m3.matrix, phen = phen.vector, tree = "NJ*", phen.type = "continuous")

I also have the outputs you requested for the four different runs. Since there were two images created successively in the console during each run, I didn't know how to export them individually as the program was running, so I just created .pngs using screen capture as they appeared. The image files are self-explanatory relative to the 4 runs, i.e. NJ, NJ_star (didn't know what NJ* did, but tried it anyway), parsimony and parsimony_seed1.

Finally, I did notice that when the final histogram was being generated, the XTerm flickered back and forth and I could see something like colored dot groupings; however, after doing that a few times it ended up with the final histogram that I have included in the link. Not sure what that was about, but it happened more than once.

Thank you very much for offing to take a look at all of this!

Best,

Walt

caitiecollins commented 5 years ago

Hi Walt,

Thanks for sharing your data with me. I think I may have made some progress!

I've just pushed a bunch of updates to the package onto GitHub, including a few that had accumulated. So hopefully if you re-install the package, some things may behave a little nicer.

Before I get to your data, two general points: () First, I would say, especially if you're new to treeWAS and GWAS more generally, I would definitely recommend reading through the treeWAS tutorial (you can find it on the GitHub Wiki https://github.com/caitiecollins/treeWAS/wiki, among other places). () Second, it sounds like you may be working in R without a good R user interface? If so, you might benefit from installing something like RStudio https://www.rstudio.com/products/RStudio/#Desktop (easy to install, much easier to use).

At the moment, from what you've said, I think you're only seeing the last of 7 output plots! Flashing past you should be: () 1 tree () 3 manhattan plots (1 per association score) (*) 3 null distribution histograms (1 per assoc. score) It sounds like you've only been seeing the final null dist histogram, for Score 3 (which is often the least powerful of the three assoc.tests).

Installing something like Rstudio would allow you to click back and forth between plots, and to save them as pdfs/pngs easily. But, there's also actually an argument to treeWAS (filename.plot="your_filename_here.pdf") that will allow you to automatically save all plots generated by a given run of treeWAS in a single multi-page pdf file.

########################################################## Ok, so with respect to your analysis, here's the code I used to play around with your data:

Running NEWest version of treeWAS:

library(devtools)

install treeWAS from github:

install_github("caitiecollins/treeWAS", build_vignettes = TRUE) library(treeWAS)

Load data:

setwd("./treeWAS Forum Qs/WallyL/") snps <- get(load("./snps_clean.Rdata")) phen <- get(load("./phen_clean.Rdata")) str(snps) str(phen)

Run treeWAS:

out <- treeWAS(snps = snps, phen = phen, tree = "NJ", phen.type = "continuous", seed = 1, filename.plot = "./treeWAS_plots.pdf")

Save output:

save(out, file="./treeWAS_out.Rdata")

You can now see these results in plot form in the "treeWAS_plots.pdf" file. You can also check and save a summary of the output from all three scores by using the print/write.treeWAS() functions:

print(out) write.treeWAS(out, file="./treeWAS_output.csv")

From this run, it now looks like we did actually find a couple associations with Score 2 (my personal favourite score). Although, I'm not sure why this is a different result from before... It's possible there was a bug in the package that's now been resolved in the current update. Please let me know if you get a different result.

If this is a stable result though, then that's certainly good news. The only other caveat/consideration I would note is that all of the associations appear to occur at the very tail end of your sequences. A peak at the ends of sequences can sometimes occur if there are data quaility issues. But, as long as you're confident in your sequence data quality, then it looks like you may now have a few significant candidate associations to investigate further!

Ok, let me know what you think, or if you have any questions.

All the best, Caitlin.

PS--- The NJ*() function is just a version of the regular neighbour-joining (NJ) tree-building method that allows you to apply NJ to a distance matrix that may contain missing values.

caitiecollins commented 5 years ago

Hi Walt,

Happy Easter/Passover! I believe I have fixed the bug!

I was evidently a little too pleased with myself, having made a number of changes that looked successful on my end, and I didn't realise I'd actually left you with a broken package when I absconded for the Easter long weekend. I apologise for leaving you in the lurch like that.

I've run a bunch of checks and made a few key fixes and, as far as I can tell, everything ought to be working well now. So I'm hoping you'll be able to write back with positive news of an easy re-installation and a successful re-run of your analysis. If you run into any trouble, though, I'll be here to troubleshoot.

Best of luck, Caitlin.

WallyL commented 5 years ago

Hi Caitlin,

No worries and Happy Easter to you also. I hope you had an enjoyable holiday weekend!

I actually went back and installed the previous version and ran it in RStudio. I still want to drop one of the samples and perform some reruns, so I’ll install the new package today and see how it goes.

Interestingly, one of our interns was able to install the broken package on a server running Ubuntu 18.04 (using the same route as I had tried and failed with on my boxes) AND the library loaded for her without an error; however, when I tried to run my data it threw an error regarding inability to locate filename(?).rdp. That’s when I was certain there was an issue and reverted back to the older package (had to load adegenet separately) and was up and running again.

I’ll be in touch.

Best, Walt

W. Walter Lorenz, Ph.D. Asst. Research Scientist | Institute of Bioinformatics Lead Bioinformatics Consultant | Georgia Genomics and Bioinformatics Core University of Georgia | B320A Life Sciences Building | Athens, GA 30602 Office: 706-542-7974 | Fax: 706-542-1738 wlorenz@uga.eduwlorenz@uga.edu%20 | http://dna.uga.eduhttp://dna.uga.edu/

[Screen-GGBC-H-FC - Copy]

From: caitiecollins notifications@github.com Sent: Wednesday, April 24, 2019 6:36 AM To: caitiecollins/treeWAS treeWAS@noreply.github.com Cc: Walt Lorenz wlorenz@uga.edu; Author author@noreply.github.com Subject: Re: [caitiecollins/treeWAS] Tree Plot Error: Error in num2col(var, col.pal = seasun) : could not find function "num2col" (#34)

Hi Walt,

Happy Easter/Passover! I believe I have fixed the bug!

I was evidently a little too pleased with myself, having made a number of changes that looked successful on my end, and I didn't realise I'd actually left you with a broken package when I absconded for the Easter long weekend. I apologise for leaving you in the lurch like that.

I've run a bunch of checks and made a few key fixes and, as far as I can tell, everything ought to be working well now. So I'm hoping you'll be able to write back with positive news of an easy re-installation and a successful re-run of your analysis. If you run into any trouble, though, I'll be here to troubleshoot.

Best of luck, Caitlin.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/caitiecollins/treeWAS/issues/34#issuecomment-486168733, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AG6H3CMX6DBLQC634WFCCCDPSAZXRANCNFSM4HGAYGLQ.

WallyL commented 5 years ago

Hi Caitlin,

Just an FYI that the new lib. installed and is working fine.

Thank you again for your help!

Best, Walt

W. Walter Lorenz, Ph.D. Asst. Research Scientist | Institute of Bioinformatics Lead Bioinformatics Consultant | Georgia Genomics and Bioinformatics Core University of Georgia | B320A Life Sciences Building | Athens, GA 30602 Office: 706-542-7974 | Fax: 706-542-1738 wlorenz@uga.eduwlorenz@uga.edu%20 | http://dna.uga.eduhttp://dna.uga.edu/

[Screen-GGBC-H-FC - Copy]

From: Walt Lorenz Sent: Wednesday, April 24, 2019 10:36 AM To: caitiecollins/treeWAS reply@reply.github.com Subject: RE: [caitiecollins/treeWAS] Tree Plot Error: Error in num2col(var, col.pal = seasun) : could not find function "num2col" (#34)

Hi Caitlin,

No worries and Happy Easter to you also. I hope you had an enjoyable holiday weekend!

I actually went back and installed the previous version and ran it in RStudio. I still want to drop one of the samples and perform some reruns, so I’ll install the new package today and see how it goes.

Interestingly, one of our interns was able to install the broken package on a server running Ubuntu 18.04 (using the same route as I had tried and failed with on my boxes) AND the library loaded for her without an error; however, when I tried to run my data it threw an error regarding inability to locate filename(?).rdp. That’s when I was certain there was an issue and reverted back to the older package (had to load adegenet separately) and was up and running again.

I’ll be in touch.

Best, Walt

W. Walter Lorenz, Ph.D. Asst. Research Scientist | Institute of Bioinformatics Lead Bioinformatics Consultant | Georgia Genomics and Bioinformatics Core University of Georgia | B320A Life Sciences Building | Athens, GA 30602 Office: 706-542-7974 | Fax: 706-542-1738 wlorenz@uga.eduwlorenz@uga.edu%20 | http://dna.uga.eduhttp://dna.uga.edu/

[Screen-GGBC-H-FC - Copy]

From: caitiecollins notifications@github.com<mailto:notifications@github.com> Sent: Wednesday, April 24, 2019 6:36 AM To: caitiecollins/treeWAS treeWAS@noreply.github.com<mailto:treeWAS@noreply.github.com> Cc: Walt Lorenz wlorenz@uga.edu<mailto:wlorenz@uga.edu>; Author author@noreply.github.com<mailto:author@noreply.github.com> Subject: Re: [caitiecollins/treeWAS] Tree Plot Error: Error in num2col(var, col.pal = seasun) : could not find function "num2col" (#34)

Hi Walt,

Happy Easter/Passover! I believe I have fixed the bug!

I was evidently a little too pleased with myself, having made a number of changes that looked successful on my end, and I didn't realise I'd actually left you with a broken package when I absconded for the Easter long weekend. I apologise for leaving you in the lurch like that.

I've run a bunch of checks and made a few key fixes and, as far as I can tell, everything ought to be working well now. So I'm hoping you'll be able to write back with positive news of an easy re-installation and a successful re-run of your analysis. If you run into any trouble, though, I'll be here to troubleshoot.

Best of luck, Caitlin.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/caitiecollins/treeWAS/issues/34#issuecomment-486168733, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AG6H3CMX6DBLQC634WFCCCDPSAZXRANCNFSM4HGAYGLQ.