Closed mattbawn closed 5 years ago
Hi Matt,
Thanks very much for using the tool! Unfortunately rJava is notoriously temperamental, as far as I can see this error is being caused by rJava rather than homoplasyFinder and hopefully we can get to the bottom of it.
I have a couple of things to try. Firstly and you may have already done this, but could you run the following code and send me any error messages:
# Find the FASTA and tree files attached to package
fastaFile <- system.file("extdata", "example.fasta", package = "homoplasyFinder")
treeFile <- system.file("extdata", "example.tree", package = "homoplasyFinder")
# Get the current working directory
workingDirectory <- paste(getwd(), "/")
# Run the HomoplasyFinder java code
inconsistentPositions <- runHomoplasyFinderInJava(treeFile, fastaFile, workingDirectory)
Next, could you run the following code and send me any error messages:
# Find the FASTA and tree files attached to package
fastaFile <- system.file("extdata", "example.fasta", package = "homoplasyFinder")
treeFile <- system.file("extdata", "example.tree", package = "homoplasyFinder")
# Run the HomoplasyFinder jar tool
runHomosyFinderJarTool(treeFile, fastaFile)
The runHomoplasyFinderJarTool()
function in the second code block is a function that runs HomoplasyFinder without using rJava, hopefully that'll work! Both these code blocks are using files that I have attached within the homoplasyFinder
package.
Let me know how you get on and thanks again for using the tool.
Joe
Hi Joe,
Thanks for the reply. For the first code block:
> inconsistentPositions <- runHomoplasyFinderInJava(treeFile, fastaFile, workiIdentified 10 positions with consistency index < 1:
57
179
207
241
339
534
559
689
696
771
Error in rJava::.jcall(javaHomoplasyFinderClass, method = "runHomoplasyFinderFromR", :
java.io.IOException: No such file or directory
and for the second:
Error in runHomosyFinderJarTool(treeFile, fastaFile) :
could not find function "runHomosyFinderJarTool"
Thanks
Matt
Hi Matt,
Thanks for trying those code blocks. For the second code block, I wrote a typo - sorry! - it should be:
# Find the FASTA and tree files attached to package
fastaFile <- system.file("extdata", "example.fasta", package = "homoplasyFinder")
treeFile <- system.file("extdata", "example.tree", package = "homoplasyFinder")
# Run the HomoplasyFinder jar tool
runHomoplasyFinderJarTool(treeFile, fastaFile)
Your output from the first block is confusing. It seems like HomoplasyFinder worked, because it reported the positions of the homoplasies:
57
179
207
241
339
534
559
689
696
771
But it is giving a new error, a file not found error - which should be pretty easy to solve.
Would you mind following these steps, essentially re-running the code again but we'll use screenshots of the code and output to try and get more information:
# Clear the environment
rm(list=ls())
gc()
fastaFile <- system.file("extdata", "example.fasta", package = "homoplasyFinder") treeFile <- system.file("extdata", "example.tree", package = "homoplasyFinder")
workingDirectory <- paste(getwd(), "/")
inconsistentPositions <- runHomoplasyFinderInJava(treeFile, fastaFile, workingDirectory)
3. Take screenshot and send it
4. Clear your environment, save workspace and restart R/Rstudio
5. Create a new R script and paste into it the following code (same as first code block I sent, sorry to make you run it again):
rm(list=ls()) gc()
fastaFile <- system.file("extdata", "example.fasta", package = "homoplasyFinder") treeFile <- system.file("extdata", "example.tree", package = "homoplasyFinder")
runHomoplasyFinderJarTool(treeFile, fastaFile)
6. Take screenshot and send it
Let me know how you get on, sorry that this is a bit of a pain!
Joe
Hi Matt,
Could you also print the fastaFile, treeFile and workingDirectory variables that you create when you run the first code block:
print(fastaFile)
print(treeFile)
print(workingDirectory)
And send me the output and check that these files exist on your computer.
Thanks for sticking with it!
Joe
Hi Joe
So I am running R on a cluster. 1st code block:
2nd code block
I notice on your code that you don't need library(HomoplasyFinder)
. I was wondering perhaps you use a local version that is different to that that I used devtools
to download from github?
Thanks,
Matt
Hi Matt,
I think you are right, it is a problem with the github version of homoplasyFinder
. I've replicated the problem on my computer and should have a solution soon. Sorry for the trouble and time this is costing.
Joe
Hi Matt,
I've spotted the s=problem with the first code block. Could you change this line:
workingDirectory <- paste(getwd(), "/")
To:
workingDirectory <- paste0(getwd(), "/")
Let me know if that works. I'll let you know when I've fixed the second code block.
Joe
Hi Matt,
I've found the problem with the second code block as well, a problem with the NAMESPACE
file. I've fixed this issue and updated the github respository, if you re-install homoplasyFinder
, code block 2 should work as well.
Also, you were correct, both code blocks are missing a library(homoplasyFinder)
line.
Joe
Hi Matt,
Line 8 of your error message tells me you didn't change the workingDirectory <- paste(getwd(), "/")
to workingDirectory <- paste0(getwd(), "/")
because there is space present in the path (before the "/"), which is causing the error. Could you change the offending line and try the script again.
Joe
Thanks Joe,
I hadnt seen your latest comments when I uploaded the screen shot. After downloading the newest version and making th escript changes it now works with the example data.
I am trying to use my own data now and was wondering how you would recommend I do this?
I am currentlt using ape
:
fastaFile <- read.FASTA("core_all_rep_full.fa")
treeFile <- read.tree("RAxML_bestTree.raxml-all-rep-snippy")
Thanks again for your help,
Matt
Hi Matt,
That's excellent! Yes, I thought you must have missed that comment. For your own code I would recommend the following code:
fastaFile <- "path/to/core_all_rep_full.fa"
treeFile <- "path/to/RAxML_bestTree.raxml-all-rep-snippy"
workingDirectory <- "path/to/where/you/want/the/output/files/"
The reading of the files into R
objects is handled internally in the runHomoplasyFinderInJava()
function so all you need is the full paths to the files. You'll need to replace the "path/to/..."
with the path to these files on your server. I would always recommend providing the full path to FASTA and TREE files on your computer/server as it makes the code clearer.
Let me know how you get on.
Joe
Hi Joe,
when I run with my files I get:
Created output FASTA file without inconsistent sites:
/nbi/Research-Groups/IFR/Rob-Kingsley/Homoplasy/mydata/retry/nInconsistentSites_20-02-19.fasta
Created Newick phylogenetic tree file with annotated inconsistent positions:
/nbi/Research-Groups/IFR/Rob-Kingsley/Homoplasy/mydata/retry/annotatedNewickTree_20-02-19.tree
Created report detailing the inconsistent sites identified:
/nbi/Research-Groups/IFR/Rob-Kingsley/Homoplasy/mydata/retry/consistencyIndexReport_20-02-19.txt
*** caught segfault ***
address 0x572b120, cause 'memory not mapped'
Traceback:
1: FUN(X[[i]], ...)
2: lapply(STRING, .treeBuild)
3: ape::read.tree(paste0(path, "annotatedNewickTree_", date, ".tree"))
4: readAnnotatedTree(workingDirectory)
An irrecoverable exception occurred. R is aborting now ...
Segmentation fault
Any Ideas?
Thanks,
Matt
Hi Matt,
Yes, I've seen this error before. You must be analysing a large dataset, this error is caused by running out of RAM in R. Add this code to increase your RAM limit to java applications:
options(java.parameters = "-Xmx10000m") # Sets RAM limit to 10GB
to the first line of your R script before you load any R packages. The above code sets the RAM limit to 10GB but if you have more RAM to spare and you keep seeing the problem you can increase it.
Let me know if that works, thanks for sticking with it!
Joe
Hi Joe,
I had increased the memory to 30G anyway due to a previous error so now I've increased the memory to 80G but get the following:
Created output FASTA file without inconsistent sites:
/nbi/Research-Groups/IFR/Rob-Kingsley/Homoplasy/mydata/retry/nInconsistentSites_20-02-19.fasta
Created Newick phylogenetic tree file with annotated inconsistent positions:
/nbi/Research-Groups/IFR/Rob-Kingsley/Homoplasy/mydata/retry/annotatedNewickTree_20-02-19.tree
Created report detailing the inconsistent sites identified:
/nbi/Research-Groups/IFR/Rob-Kingsley/Homoplasy/mydata/retry/consistencyIndexReport_20-02-19.txt
Error in FUN(X[[i]], ...) :
attempt to set index 12166283/130 in SET_STRING_ELT
Calls: readAnnotatedTree -> <Anonymous> -> lapply -> FUN
Execution halted
Thanks,
Matt
Hi Matt,
Wow, your dataset must be massive! Do you mind if I ask how many sequences you have and what the length of your alignment is?
The good news is it looks like HomoplasyFinder has worked, the error is coming from the readAnnotatedTree()
function. This function is just a single line:
readAnnotatedTree <- function(path, date=format(Sys.Date(), "%d-%m-%y")){
return(ape::read.tree(paste0(path, "annotatedNewickTree_", date, ".tree")))
}
There are a few things to do to look into this error. Firstly to check HomoplasyFinder has worked:
readAnnotatedTree()
from your script and run it againIf all these files look alright then and step 1 runs without any errors then HomoplasyFinder has worked and it is the read.tree()
function in ape
that is throwing the error. To test the read.tree()
function in ape you could try the following code by itself (assuming you have kept the annotated tree file):
ape::read.tree(annotatedNewickTree_20-02-19.tree)
Could you also try reading in your original tree file into R, that you send into the runHomoplasyFinderInJava()
function:
ape::read.tree(treeFile)
The error message is quite cryptic but it has been reported in R before, I think it might be caused by the size of the annotated newick tree file. I might need to change the way that I annotate the homoplasies on the phylogeny.
Thanks for pointing out these issues!
Joe
Hi Joe,
Thanks, yes I was using the full genome alignment so I could more easily (lazily) trace back to the chromosomal locations. so 134 sequenced of 5mb. Now I am using just the variant sites it runs to completion.
The output pdf is a little busy though, do you know how I can resize it?
I tried:
png("Homoplasy.pdf", # create PNG for the heat map
width = 50, # 5 x 300 pixels
height = 10,
#res = 300, # 300 pixels per inch
pointsize = 8) # smaller font size
# Plot the annotated tree
plotAnnotatedTree(tree, inconsistentPositions, fastaFile)
dev.off()
but got an X11
error:
Error in png("Homoplasy.pdf", width = 50, height = 10, pointsize = 8) :
X11 is not available
Execution halted
Thank you again for your help!
Matt
Hi Matt,
Excellent, I'm really glad it is running without errors. To make a slightly better plot you could try the following code:
# Open a PDF
pdf("Homoplasy.pdf",
width = 20, # In inches
height = 20) # In inches
# Plot the annotated tree
plotAnnotatedTree(tree, inconsistentPositions, fastaFile,
addNodeLabels=FALSE, # Remove the internal node labels - too messy!
alignmentCex=4, # Increase the size
alignmentPositionCex=2) # Scale the size of the position labels
# Close the PDF
dev.off()
A couple of points:
pdf()
function in R. A PDF will scale much better and shouldn't give that error you showed. You can try changing the width
and height
parameters and that will change how bunched togather everything is.alignmentCex
and alignmentPositionCex
parameters of the plotAnnotatedTree()
I must admit that the plotAnnotatedTree()
function definitely needs a little more work and I'll be trying to improve it with time.
Thanks,
Joe
Hi Matt,
I have made a couple of slight improvements to the plotAnnotatedTree()
function. Aside from minor code changes, I've add two new parameters:
addSeparatorLines
- before white lines were plotted between sites, this is now optional and should make your alignment less opaque looking
actualPositions
- This parameter allows you to send a different vector of positions that matches the length of the inconsistentPositions vector but ideally contains the actual positions on your genome. Might be handy for you.
I hope these are useful.
Joe
Thanks Joe,
I'll let you know how I get on.
Matt
I have been trying to get your package working for a couple of days now, the main issue being problems installing
rJava
.In the end I think it is installed properly. I have installed it and rJava (JDK-11.0.2) seperately into a cluster environment in R-3.5.2
I am going through your code with my own data however, I get the following: