Closed cizydorczyk closed 7 years ago
Hi Conrad,
Sorry for the delay--I've been a bit busy. I'll make time to look into your problem today and tomorrow, though.
I will say, if it's at all possible, it would definitely make my job easier if you were able to share your data (tree, snps, snps.rec) without risking any privacy issues (possibly by giving them new names, if needed)?
I may be able to find the issue on my own, but I'll almost certainly be faster if I can work with your data. Any chance you could share it?
Thank you, Caitlin.
Hello Caitlin,
Thank you for your response - much appreciated.
Would you still like me to share the data I am trying to use with you? No privacy issues - I work mostly with anonymous data anyway.
Best regards,
Conrad
On Mon, Aug 21, 2017 at 11:17 AM, caitiecollins notifications@github.com wrote:
I take that back. I think I may have already found it.
- CC
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/caitiecollins/treeWAS/issues/10#issuecomment-323771710, or mute the thread https://github.com/notifications/unsubscribe-auth/AVzUjAKeKxHAFVqyU8XzC4RpMjSJDSnPks5saZ9vgaJpZM4O6caR .
Hi,
Ok, so, I think I've resolved the issue.
It was getting stuck on a check for the row order of snps/snps.rec/tree. I've reworked it a bit to try and make it as foolproof as possible. There's a new requirement now. Essentially, if you are providing either the snps.reconstruction or phen.reconstruction, you will need to ensure that the order of the rows/indices corresponding to the internal nodes in these reconstructions actually match the order in the tree.
The way treeWAS is going to handle this is by requiring that your tree contains a label in tree$node.label corresponding to each internal node and that snps.reconstruction, in rows Nterminal+1 to Ntotal, has the same set of labels in whichever order is appropriate.
Generally speaking, I believe that the order of these nodes in the tree and reconstruction should match if they've come out of ClonalFrameML. At the moment, at least, treeWAS will assume that if the order of snps.reconstruction rows 1 to Nterminal match the order of tree$tip.label, then the internal nodes also match (in which case it will do the labelling of tree$node.label etc. for you and spit out a notice or warning that it's doing so).
All of which is to say that I'm pretty sure it's fixed and will work fine (once you've re-download and installed the updates).
Of course, if it's not and you run into any other kind of issue, I think having your data would probably help me for any new corrections I might need to make.
Last point: I think I mention this somewhere in the documentation, but I would generally recommend that you use the internal snps reconstruction functions instead of providing your own reconstruction. The reason being that treeWAS needs to compare the snps reconstruction to the reconstruction that it does for the simulated snps, so it is best that they be done in as similar a way as possible.
Anyway, thank you very much for letting me know about this issue. And don't hesitate to ask if you run into any other issues or have any questions.
All the best, CC.
Hi Caitlin,
Thank you for the update and for taking the time to resolve this issue. As you mention, I have actually used the internal snps reconstruction functions and it worked, and will probably go ahead with using that instead of providing my own (ClonalFrameML) reconstruction.
I was curious if there would be any differences in results if I ran it using my own reconstructions as well, given that my snp alignment is rather large (~200 K snps).
I shall try it out again soon!
That said, if I may ask two quick questions about treeWAS implementation/result interpretation - I have tried running treeWAS using SNPs multiple times, and have gotten slightly different results each time when using different seeds. Would this be something expected, I would guess due to the nature of data simulations performed?
Lastly, do you think treeWAS would scale well to a kmer-based analysis, where one would be dealing with a much larger number of potentially significant kmers than snps?
Best regards,
Conrad
On Mon, Aug 21, 2017 at 7:21 PM, caitiecollins notifications@github.com wrote:
Hi,
Ok, so, I think I've resolved the issue.
It was getting stuck on a check for the row order of snps/snps.rec/tree. I've reworked it a bit to try and make it as foolproof as possible. There's a new requirement now. Essentially, if you are providing either the snps.reconstruction or phen.reconstruction, you will need to ensure that the order of the rows/indices corresponding to the internal nodes in these reconstructions actually match the order in the tree.
The way treeWAS is going to handle this is by requiring that your tree contains a label in tree$node.label corresponding to each internal node and that snps.reconstruction, in rows Nterminal+1 to Ntotal, has the same set of labels in whichever order is appropriate.
Generally speaking, I believe that the order of these nodes in the tree and reconstruction should match if they've come out of ClonalFrameML. At the moment, at least, treeWAS will assume that if the order of snps.reconstruction rows 1 to Nterminal match the order of tree$tip.label, then the internal nodes also match (in which case it will do the labelling of tree$node.label etc. for you and spit out a notice or warning that it's doing so).
All of which is to say that I'm pretty sure it's fixed and will work fine (once you've re-download and installed the updates).
Of course, if it's not and you run into any other kind of issue, I think having your data would probably help me for any new corrections I might need to make.
Last point: I think I mention this somewhere in the documentation, but I would generally recommend that you use the internal snps reconstruction functions instead of providing your own reconstruction. The reason being that treeWAS needs to compare the snps reconstruction to the reconstruction that it does for the simulated snps, so it is best that they be done in as similar a way as possible.
Anyway, thank you very much for letting me know about this issue. And don't hesitate to ask if you run into any other issues or have any questions.
All the best, CC.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/caitiecollins/treeWAS/issues/10#issuecomment-323877128, or mute the thread https://github.com/notifications/unsubscribe-auth/AVzUjIVUurjk00QCaZFnmd47xv11UGIlks5sahDfgaJpZM4O6caR .
Hi,
Yes of course you are welcome to compare results with internal and external reconstructions, I just wanted to offer a word of caution on why you might see differences there and what problems might arise with the external reconstruction.
And you are absolutely correct in your interpretation of why results may vary slightly between runs when using different seeds. There are elements of randomness within the data simulation, so although the data simulations may be based on the same parameters, you will see some variation there.
Regarding using treeWAS on kmers, I would say that in theory the approach should be applicable, but I would imagine most machines would probably lack the memory to actually run treeWAS on kmers and that you might also struggle with reduced power as a result of the expanded need to correct for multiple testing.
Instead of running treeWAS on kmers, you can get decent coverage of the genetic variation by running treeWAS on both SNPs and matrices of gene presence/absence (both analyses should use the same tree (i.e., a tree built from whole-genome sequences)). By analysing both SNPs and gene presence/absence you can identify relevant variation in both the core and accessory genomes. By using kmers you may just end up expending a good deal of computational time and individual effort in running the analysis and likely tracking down kmer variation that correspond to regions within the SNPs or gene presence/absence matrices anyway. Beyond this, kmers might allow you to identify, for example, variation in promoter regions, but I'm not sure the benefits would be extensive.
Anyway, to answer your question directly, I think the reconstruction and simulation elements of treeWAS might have trouble scaling up to the scope required for a kmer-based analysis. I think it would work in theory but that the benefits would be small relative to the potentially prohibitive computational effort.
Best, Caitlin.
Hello,
Great job on the program - looks very promising!
Unfortunately, I am having some issues implementing treeWAS using ClonalFrameML output. I can load the data just fine using the read.CFML() function, as per "ClonalFrameML Integration" instructions, but run into an error when trying to specify the snps.reconstruction by using the reconstruction provided by ClonalFrameML:
At which point the process terminates.
Here is how I run the treeWAS function:
I have tried checking whether the snps1 matrix and snps.rec matrix have the same number of columns, and they do. They differ in the number of rows, the snps.rec matrix containing 325 rows for all terminal (163) and internal (162) nodes, whereas the snps1 matrix only contains rows for terminal nodes (163). I have also tried a) inputting a matrix of only internal nodes (162 nodes - the number specified in the tree1 object) to the snps.reconstruction argument, and b) inputting a matrix of only terminal nodes (163) to the snps.reconstruction argument, but neither has solved the issue.
When I try running treeWAS without inputting the snps.rec matrix (using parsimony reconstruction instead), treeWAS runs successfully with no issues. However, I would like to be able to compare the results with those output when the snps.reconstruction argument is provided.
Any help in resolving this issue would be greatly appreciated.
Thank you,
Conrad Izydorczyk