JosephCrispell / homoplasyFinder

A tool to identify and annotate homoplasies on a phylogeny and sequence alignment
GNU General Public License v3.0
19 stars 3 forks source link

Error inconsistentPositions #1

Closed mattbawn closed 5 years ago

mattbawn commented 5 years ago

I have been trying to get your package working for a couple of days now, the main issue being problems installing rJava.

In the end I think it is installed properly. I have installed it and rJava (JDK-11.0.2) seperately into a cluster environment in R-3.5.2

I am going through your code with my own data however, I get the following:

inconsistentPositions <- runHomoplasyFinderInJava(treeFile, fastaFile, workingDirectory)
Error in rJava::.jcall(javaHomoplasyFinderClass, method = "runHomoplasyFinderFromR",  : 
  method runHomoplasyFinderFromR with signature (Ljava/lang/String;ZZZZZZ)[I not found

Any Ideas?

Thanks,

Matt
JosephCrispell commented 5 years ago

Hi Matt,

Thanks very much for using the tool! Unfortunately rJava is notoriously temperamental, as far as I can see this error is being caused by rJava rather than homoplasyFinder and hopefully we can get to the bottom of it.

I have a couple of things to try. Firstly and you may have already done this, but could you run the following code and send me any error messages:

# Find the FASTA and tree files attached to package
fastaFile <- system.file("extdata", "example.fasta", package = "homoplasyFinder")
treeFile <- system.file("extdata", "example.tree", package = "homoplasyFinder")

# Get the current working directory
workingDirectory <- paste(getwd(), "/")

# Run the HomoplasyFinder java code
inconsistentPositions <- runHomoplasyFinderInJava(treeFile, fastaFile, workingDirectory)

Next, could you run the following code and send me any error messages:

# Find the FASTA and tree files attached to package
fastaFile <- system.file("extdata", "example.fasta", package = "homoplasyFinder")
treeFile <- system.file("extdata", "example.tree", package = "homoplasyFinder")

# Run the HomoplasyFinder jar tool
runHomosyFinderJarTool(treeFile, fastaFile)

The runHomoplasyFinderJarTool() function in the second code block is a function that runs HomoplasyFinder without using rJava, hopefully that'll work! Both these code blocks are using files that I have attached within the homoplasyFinder package.

Let me know how you get on and thanks again for using the tool.

Joe

mattbawn commented 5 years ago

Hi Joe,

Thanks for the reply. For the first code block:

> inconsistentPositions <- runHomoplasyFinderInJava(treeFile, fastaFile, workiIdentified 10 positions with consistency index < 1: 
57
179
207
241
339
534
559
689
696
771
Error in rJava::.jcall(javaHomoplasyFinderClass, method = "runHomoplasyFinderFromR",  : 
  java.io.IOException: No such file or directory

and for the second:

Error in runHomosyFinderJarTool(treeFile, fastaFile) : 
  could not find function "runHomosyFinderJarTool"

Thanks

Matt

JosephCrispell commented 5 years ago

Hi Matt,

Thanks for trying those code blocks. For the second code block, I wrote a typo - sorry! - it should be:

# Find the FASTA and tree files attached to package
fastaFile <- system.file("extdata", "example.fasta", package = "homoplasyFinder")
treeFile <- system.file("extdata", "example.tree", package = "homoplasyFinder")

# Run the HomoplasyFinder jar tool
runHomoplasyFinderJarTool(treeFile, fastaFile) 

Your output from the first block is confusing. It seems like HomoplasyFinder worked, because it reported the positions of the homoplasies:

57
179
207
241
339
534
559
689
696
771

But it is giving a new error, a file not found error - which should be pretty easy to solve.

Would you mind following these steps, essentially re-running the code again but we'll use screenshots of the code and output to try and get more information:

  1. Clear your environment, save workspace and restart R/Rstudio
  2. Create a new R script and paste into it the following code (same as first code block I sent, sorry to make you run it again):
    
    # Clear the environment
    rm(list=ls())
    gc()

Find the FASTA and tree files attached to package

fastaFile <- system.file("extdata", "example.fasta", package = "homoplasyFinder") treeFile <- system.file("extdata", "example.tree", package = "homoplasyFinder")

Get the current working directory

workingDirectory <- paste(getwd(), "/")

Run the HomoplasyFinder java code

inconsistentPositions <- runHomoplasyFinderInJava(treeFile, fastaFile, workingDirectory)

3. Take screenshot and send it
4. Clear your environment, save workspace and restart R/Rstudio
5. Create a new R script and paste into it the following code (same as first code block I sent, sorry to make you run it again):

Clear the environment

rm(list=ls()) gc()

Find the FASTA and tree files attached to package

fastaFile <- system.file("extdata", "example.fasta", package = "homoplasyFinder") treeFile <- system.file("extdata", "example.tree", package = "homoplasyFinder")

Run the HomoplasyFinder jar tool

runHomoplasyFinderJarTool(treeFile, fastaFile)


6. Take screenshot and send it

Let me know how you get on, sorry that this is a bit of a pain!

Joe
JosephCrispell commented 5 years ago

Hi Matt,

Could you also print the fastaFile, treeFile and workingDirectory variables that you create when you run the first code block:

print(fastaFile)
print(treeFile)
print(workingDirectory)

And send me the output and check that these files exist on your computer.

Thanks for sticking with it!

Joe

mattbawn commented 5 years ago

Hi Joe

So I am running R on a cluster. 1st code block: screenshot 2019-02-20 08 47 24

2nd code block screenshot 2019-02-20 08 49 17

I notice on your code that you don't need library(HomoplasyFinder). I was wondering perhaps you use a local version that is different to that that I used devtools to download from github?

Thanks,

Matt

JosephCrispell commented 5 years ago

Hi Matt,

I think you are right, it is a problem with the github version of homoplasyFinder. I've replicated the problem on my computer and should have a solution soon. Sorry for the trouble and time this is costing.

Joe

JosephCrispell commented 5 years ago

Hi Matt,

I've spotted the s=problem with the first code block. Could you change this line:

workingDirectory <- paste(getwd(), "/")

To:

workingDirectory <- paste0(getwd(), "/")

Let me know if that works. I'll let you know when I've fixed the second code block.

Joe

JosephCrispell commented 5 years ago

Hi Matt,

I've found the problem with the second code block as well, a problem with the NAMESPACE file. I've fixed this issue and updated the github respository, if you re-install homoplasyFinder, code block 2 should work as well.

Also, you were correct, both code blocks are missing a library(homoplasyFinder) line.

Joe

mattbawn commented 5 years ago

screenshot 2019-02-20 09 30 30

JosephCrispell commented 5 years ago

Hi Matt,

Line 8 of your error message tells me you didn't change the workingDirectory <- paste(getwd(), "/") to workingDirectory <- paste0(getwd(), "/") because there is space present in the path (before the "/"), which is causing the error. Could you change the offending line and try the script again.

Joe

mattbawn commented 5 years ago

Thanks Joe,

I hadnt seen your latest comments when I uploaded the screen shot. After downloading the newest version and making th escript changes it now works with the example data.

I am trying to use my own data now and was wondering how you would recommend I do this?

I am currentlt using ape:

fastaFile <- read.FASTA("core_all_rep_full.fa")
treeFile <- read.tree("RAxML_bestTree.raxml-all-rep-snippy")

Thanks again for your help,

Matt

JosephCrispell commented 5 years ago

Hi Matt,

That's excellent! Yes, I thought you must have missed that comment. For your own code I would recommend the following code:

fastaFile <- "path/to/core_all_rep_full.fa"
treeFile <- "path/to/RAxML_bestTree.raxml-all-rep-snippy"
workingDirectory <- "path/to/where/you/want/the/output/files/"

The reading of the files into R objects is handled internally in the runHomoplasyFinderInJava() function so all you need is the full paths to the files. You'll need to replace the "path/to/..." with the path to these files on your server. I would always recommend providing the full path to FASTA and TREE files on your computer/server as it makes the code clearer.

Let me know how you get on.

Joe

mattbawn commented 5 years ago

Hi Joe,

when I run with my files I get:

Created output FASTA file without inconsistent sites:
    /nbi/Research-Groups/IFR/Rob-Kingsley/Homoplasy/mydata/retry/nInconsistentSites_20-02-19.fasta
Created Newick phylogenetic tree file with annotated inconsistent positions:
    /nbi/Research-Groups/IFR/Rob-Kingsley/Homoplasy/mydata/retry/annotatedNewickTree_20-02-19.tree
Created report detailing the inconsistent sites identified:
    /nbi/Research-Groups/IFR/Rob-Kingsley/Homoplasy/mydata/retry/consistencyIndexReport_20-02-19.txt

 *** caught segfault ***
address 0x572b120, cause 'memory not mapped'

Traceback:
 1: FUN(X[[i]], ...)
 2: lapply(STRING, .treeBuild)
 3: ape::read.tree(paste0(path, "annotatedNewickTree_", date, ".tree"))
 4: readAnnotatedTree(workingDirectory)
An irrecoverable exception occurred. R is aborting now ...
Segmentation fault

Any Ideas?

Thanks,

Matt

JosephCrispell commented 5 years ago

Hi Matt,

Yes, I've seen this error before. You must be analysing a large dataset, this error is caused by running out of RAM in R. Add this code to increase your RAM limit to java applications:

options(java.parameters = "-Xmx10000m") # Sets RAM limit to 10GB

to the first line of your R script before you load any R packages. The above code sets the RAM limit to 10GB but if you have more RAM to spare and you keep seeing the problem you can increase it.

Let me know if that works, thanks for sticking with it!

Joe

mattbawn commented 5 years ago

Hi Joe,

I had increased the memory to 30G anyway due to a previous error so now I've increased the memory to 80G but get the following:

Created output FASTA file without inconsistent sites:
    /nbi/Research-Groups/IFR/Rob-Kingsley/Homoplasy/mydata/retry/nInconsistentSites_20-02-19.fasta
Created Newick phylogenetic tree file with annotated inconsistent positions:
    /nbi/Research-Groups/IFR/Rob-Kingsley/Homoplasy/mydata/retry/annotatedNewickTree_20-02-19.tree
Created report detailing the inconsistent sites identified:
    /nbi/Research-Groups/IFR/Rob-Kingsley/Homoplasy/mydata/retry/consistencyIndexReport_20-02-19.txt
Error in FUN(X[[i]], ...) : 
  attempt to set index 12166283/130 in SET_STRING_ELT
Calls: readAnnotatedTree -> <Anonymous> -> lapply -> FUN
Execution halted

Thanks,

Matt

JosephCrispell commented 5 years ago

Hi Matt,

Wow, your dataset must be massive! Do you mind if I ask how many sequences you have and what the length of your alignment is?

The good news is it looks like HomoplasyFinder has worked, the error is coming from the readAnnotatedTree() function. This function is just a single line:

readAnnotatedTree <- function(path, date=format(Sys.Date(), "%d-%m-%y")){
     return(ape::read.tree(paste0(path, "annotatedNewickTree_", date, ".tree")))
}

There are a few things to do to look into this error. Firstly to check HomoplasyFinder has worked:

  1. Could you remove the readAnnotatedTree() from your script and run it again
  2. Could you have a look at the output files, note they are dated according to when the analysis was done. There should be three:
    • nInconsistentSites_20-02-19.fasta - you could view this using something like aliview
    • annotatedNewickTree_20-02-19.tree - you could try opening this in a text editor, icytree or figtree
    • consistencyIndexReport_20-02-19.txt - try opening this in a text editor, excel, or libreoffice calc

If all these files look alright then and step 1 runs without any errors then HomoplasyFinder has worked and it is the read.tree() function in ape that is throwing the error. To test the read.tree() function in ape you could try the following code by itself (assuming you have kept the annotated tree file):

ape::read.tree(annotatedNewickTree_20-02-19.tree)

Could you also try reading in your original tree file into R, that you send into the runHomoplasyFinderInJava() function:

ape::read.tree(treeFile)

The error message is quite cryptic but it has been reported in R before, I think it might be caused by the size of the annotated newick tree file. I might need to change the way that I annotate the homoplasies on the phylogeny.

Thanks for pointing out these issues!

Joe

mattbawn commented 5 years ago

Hi Joe,

Thanks, yes I was using the full genome alignment so I could more easily (lazily) trace back to the chromosomal locations. so 134 sequenced of 5mb. Now I am using just the variant sites it runs to completion.

The output pdf is a little busy though, do you know how I can resize it?

I tried:

 png("Homoplasy.pdf",    # create PNG for the heat map
     width = 50,        # 5 x 300 pixels
     height = 10,
     #res = 300,            # 300 pixels per inch
     pointsize = 8)        # smaller font size

# Plot the annotated tree
plotAnnotatedTree(tree, inconsistentPositions, fastaFile)
 dev.off()

but got an X11 error:

Error in png("Homoplasy.pdf", width = 50, height = 10, pointsize = 8) : 
  X11 is not available
Execution halted

Rplots.pdf

Thank you again for your help!

Matt

JosephCrispell commented 5 years ago

Hi Matt,

Excellent, I'm really glad it is running without errors. To make a slightly better plot you could try the following code:

 # Open a PDF
pdf("Homoplasy.pdf",
     width = 20, # In inches
     height = 20) # In inches

# Plot the annotated tree
plotAnnotatedTree(tree, inconsistentPositions, fastaFile, 
                               addNodeLabels=FALSE, # Remove the internal node labels - too messy!
                               alignmentCex=4, # Increase the size 
                               alignmentPositionCex=2) # Scale the size of the position labels

# Close the PDF
dev.off()

A couple of points:

  1. As you'll see in the above code, I would use the pdf() function in R. A PDF will scale much better and shouldn't give that error you showed. You can try changing the width and height parameters and that will change how bunched togather everything is.
  2. You might need to tweak the alignmentCex and alignmentPositionCex parameters of the plotAnnotatedTree()

I must admit that the plotAnnotatedTree() function definitely needs a little more work and I'll be trying to improve it with time.

Thanks,

Joe

JosephCrispell commented 5 years ago

Hi Matt,

I have made a couple of slight improvements to the plotAnnotatedTree() function. Aside from minor code changes, I've add two new parameters: addSeparatorLines - before white lines were plotted between sites, this is now optional and should make your alignment less opaque looking actualPositions - This parameter allows you to send a different vector of positions that matches the length of the inconsistentPositions vector but ideally contains the actual positions on your genome. Might be handy for you.

I hope these are useful.

Joe

mattbawn commented 5 years ago

Thanks Joe,

I'll let you know how I get on.

Matt