JosephCrispell / homoplasyFinder

A tool to identify and annotate homoplasies on a phylogeny and sequence alignment
GNU General Public License v3.0
19 stars 3 forks source link

error while running tool in R #8

Closed tanu121093 closed 4 years ago

tanu121093 commented 4 years ago

hi,

  1. i am getting the following error while running tool in R version 3.4.4,

    runHomoplasyFinderInJava(home/plab/core_gene.tree, home/plab/core_gene_alignment.fasta, home/plab) Error in rJava::.jcall(javaHomoplasyFinderClass, method = "runHomoplasyFinderFromR",
    : object 'home' not found

  2. java version which i am using: openjdk 11.0.4 2019-07-16 OpenJDK Runtime Environment (build 11.0.4+11-post-Ubuntu-1ubuntu218.04.3) OpenJDK 64-Bit Server VM (build 11.0.4+11-post-Ubuntu-1ubuntu218.04.3, mixed mode, sharing)

  3. Also, everytime i use homoplasyfinder in R and run command "runHomoplasyFinderInJava", it says command not found.

  4. why i have to install rJava packages everytime when i start homoplasyfinder in R.

  5. information regarding java in my system(ubuntu 18.04 LTS) ~$ sudo R CMD javareconf Java interpreter : /usr/lib/jvm/default-java/bin/java Java version : 11.0.4 Java home path : /usr/lib/jvm/default-java Java compiler : /usr/lib/jvm/default-java/bin/javac Java headers gen.: /usr/bin/javah Java archive tool: /usr/lib/jvm/default-java/bin/jar

trying to compile and link a JNI program detected JNI cpp flags : -I$(JAVA_HOME)/include -I$(JAVA_HOME)/include/linux detected JNI linker flags : -L$(JAVA_HOME)/lib/server -ljvm gcc -std=gnu99 -I/usr/share/R/include -DNDEBUG -I/usr/lib/jvm/default-java/include -I/usr/lib/jvm/default-java/include/linux -fpic -g -O2 -fdebug-prefix-map=/build/r-base-AitvI6/r-base-3.4.4=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g -c conftest.c -o conftest.o g++ -shared -L/usr/lib/R/lib -Wl,-Bsymbolic-functions -Wl,-z,relro -o conftest.so conftest.o -L/usr/lib/jvm/default-java/lib/server -ljvm -L/usr/lib/R/lib -lR

JAVA_HOME : /usr/lib/jvm/default-java Java library path: $(JAVA_HOME)/lib/server JNI cpp flags : -I$(JAVA_HOME)/include -I$(JAVA_HOME)/include/linux JNI linker flags : -L$(JAVA_HOME)/lib/server -ljvm Updating Java configuration in /usr/lib/R Done.

JosephCrispell commented 4 years ago

Hi,

Thanks very much for using homoplasyFinder and sorry it isn't working!

I am having trouble reproducing your error, so would you mind trying the following things to try and find out what is causing the error. It sounds like it is a problem with your rJava installation but there could well be a bug in homoplasyFinder.

Firstly, would you mind removing the homoplasyFinder and rJava packages:

remove.packages("homoplasyFinder")
remove.packages("rJava")

Then restarting R and installing them again:

install.packages("rJava", type = "source")
devtools::install_github("JosephCrispell/homoplasyFinder")

Test both packages by loading them into the library:

library(rJava)
library(homoplasyFinder)

Let me know if you have any problems with the above.

Next, would you mind running the following code and sending me the output. This code uses the example data attached to the homoplasyFinder package:

#### Preparation ####
# Load the homoplasyFinder library
library(homoplasyFinder)

# Get the current working directory
workingDirectory <- paste0(getwd(), "/")

# Get the current date
date <- format(Sys.Date(), "%d-%m-%y")

#### Read in the FASTA and tree file ####
# Find the FASTA and tree files attached to package
fastaFile <- system.file("extdata", "example.fasta", package = "homoplasyFinder")
treeFile <- system.file("extdata", "example.tree", package = "homoplasyFinder")

#### HomoplasyFinder directly from R ####
# Run the HomoplasyFinder jar tool
inconsistentPositions <- runHomoplasyFinderInJava(treeFile=treeFile, 
                                                  fastaFile=fastaFile, 
                                                  path=workingDirectory)

# Read in the output table
resultsFile <- paste0(workingDirectory, "consistencyIndexReport_", date, ".txt")
results <- read.table(resultsFile, header=TRUE, sep="\t", stringsAsFactors=FALSE)

#### HomoplasyFinder directly from R ####
# Run the HomoplasyFinder jar tool
runHomoplasyFinderJarTool(treeFile=treeFile, fastaFile=fastaFile)

# Read in the output table
resultsFile <- paste0(workingDirectory, "consistencyIndexReport_", date, ".txt")
results <- read.table(resultsFile, header=TRUE, sep="\t", stringsAsFactors=FALSE)

Thanks very much for trying all of the above and let me know how you get on.

Joe

tanu121093 commented 4 years ago

Hi,

I have run the command you said in R. and the following error i am getting

library(rJava) library(homoplasyFinder) workingDirectory <- "/home/plab/" date <- "27-11-19" treeFile <- system.file("extdata", "acinetobacter.tree", package = "homoplasyFinder") fastaFile <- system.file("extdata", "acinetobacter.fasta", package = "homoplasyFinder") inconsistentPositions <- runHomoplasyFinderInJava(treeFile=treeFile, fastaFile=fastaFile, path=workingDirectory)

  caught segfault address 0x78, cause 'memory not mapped'

Traceback:  1: .Call(RJavaCheckExceptions, silent)  2: .jcheck()  3: .jcall(.rJava.class.loader, "V", "addClassPath", as.character(path))  4: rJava::.jaddClassPath("inst/java/HomoplasyFinder.jar")  5: runHomoplasyFinderInJava(treeFile = treeFile, fastaFile = fastaFile, path = workingDirectory)

Possible actions: 1: abort (with core dump, if enabled) 2: normal R exit 3: exit R without saving workspace 4: exit R saving workspace Selection:

also, when i m trying to read annotated tree file in r, m getting this error

library(ape) readAnnotatedTree <- function(path, date) {return(ape::read.tree(paste0(path, "annotatedNewickTree_", date, ".tree")))} workingDirectory <- "/home/plab/" tree <- readAnnotatedTree(workingDirectory, date)

  caught segfault address 0x7fffdff9b000, cause 'memory not mapped' Segmentation fault (core dumped)

kindly help through this.

On 11/25/19 03:31 PM, Joseph Crispell notifications@github.com wrote:

Hi, Thanks very much for using homoplasyFinder and sorry it isn't working! I am having trouble reproducing your error, so would you mind trying the following things to try and find out what is causing the error. It sounds like it is a problem with your rJava installation but there could well be a bug in homoplasyFinder. Firstly, would you mind removing the homoplasyFinder and rJava packages:

remove.packages("homoplasyFinder")remove.packages("rJava")

Then restarting R and installing them again:

install.packages("rJava", type = "source")devtools::install_github("JosephCrispell/homoplasyFinder")

Test both packages by loading them into the library:

library(rJava)library(homoplasyFinder)

Let me know if you have any problems with the above. Next, would you mind running the following code and sending me the output. This code uses the example data attached to the homoplasyFinder package:

Preparation ##### Load the homoplasyFinder librarylibrary(homoplasyFinder)# Get the current working directoryworkingDirectory <- paste0(getwd(), "/")# Get the current datedate <- format(Sys.Date(), "%d-%m-%y")#### Read in the FASTA and tree file ##### Find the FASTA and tree files attached to packagefastaFile <- system.file("extdata", "example.fasta", package = "homoplasyFinder")treeFile <- system.file("extdata", "example.tree", package = "homoplasyFinder")#### HomoplasyFinder directly from R ##### Run the HomoplasyFinder jar toolinconsistentPositions <- runHomoplasyFinderInJava(treeFile=treeFile, fastaFile=fastaFile, path=workingDirectory)# Read in the output tableresultsFile <- paste0(workingDirectory, "consistencyIndexReport", date, ".txt")results <- read.table(resultsFile, header=TRUE, sep="\t", stringsAsFactors=FALSE)#### HomoplasyFinder directly from R ##### Run the HomoplasyFinder jar toolrunHomoplasyFinderJarTool(treeFile=treeFile, fastaFile=fastaFile)# Read in the output tableresultsFile <- paste0(workingDirectory, "consistencyIndexReport", date, ".txt")results <- read.table(resultsFile, header=TRUE, sep="\t", stringsAsFactors=FALSE)

Thanks very much for trying all of the above and let me know how you get on. Joe

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub(https://github.com/JosephCrispell/homoplasyFinder/issues/8?email_source=notifications&email_token=AMWSLABBQ3CKPWICUTEVTKLQVOO33A5CNFSM4JQ3FED2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFB2C4I#issuecomment-558080369), or unsubscribe(https://github.com/notifications/unsubscribe-auth/AMWSLAHX5E3AZK44QWZVUCTQVOO33ANCNFSM4JQ3FEDQ).

JosephCrispell commented 4 years ago

Hi,

Thanks for running that code. I notice though that you changed the fasta and tree file names from example.fasta and example.tree to acinetobacter.fasta and acinetobacter.tree. Unfortunately, this won't work properly as these lines of code:

 treeFile <- system.file("extdata", "example.tree", package = "homoplasyFinder")
 fastaFile <- system.file("extdata", "example.fasta", package = "homoplasyFinder")

are finding specific example files that I have generated and attached to the homoplasyFinder package.

I know it is a pain but would you be able to re-run the code I provided without any changes and send me any output that you get?

In the meantime, I will have a look into what might be causing a segmentation fault as I have never seen that before!

Thanks for working on this issue with me!

Joe

tanu121093 commented 4 years ago

hi,

i tried running the code as given by you

install.packages("devtools")devtools::install_github("JosephCrispell/homoplasyFinder")devtools::install_github("JosephCrispell/basicPlotteR") # Makes annotated plotted phylogeny prettier :-)library(homoplasyFinder)

fastaFile <- system.file("extdata", "example.fasta", package = "homoplasyFinder")

treeFile <- system.file("extdata", "example.tree", package = "homoplasyFinder")

Get the current working directoryworkingDirectory <- paste0(getwd(), "/")# Run the HomoplasyFinder jar toolinconsistentPositions <- runHomoplasyFinderInJava(treeFile=treeFile, fastaFile=fastaFile, path=workingDirectory) # Get the current datedate <- format(Sys.Date(), "%d-%m-%y") # Read in the output tableresultsFile <- paste0(workingDirectory, "consistencyIndexReport_", date, ".txt")results <- read.table(resultsFile, header=TRUE, sep="\t", stringsAsFactors=FALSE) # Read in the annotated treetree <- readAnnotatedTree(workingDirectory) # Plot the annotated treeplotAnnotatedTree(tree, inconsistentPositions, fastaFile)

i was able to plot your data. but when i use my dataset i cud # read in the output table.

when i try to # read in the annotated tree, Rstudio shows following error:

R Session Aborted r encountered a fatal error.this session was terminated. (all this performed on r 3.6.1 version)

also i tried using r version 3.3.3, the package homoplasyfinder was not available there. tell me the version of r and rstudio to be used. also if any packages need to be installed in r apart from homoplasyfinder and devtools.

On 11/28/19 08:45 PM, Joseph Crispell notifications@github.com wrote:

Hi, Thanks for running that code. I notice though that you changed the fasta and tree file names from example.fasta and example.tree to acinetobacter.fasta and acinetobacter.tree. Unfortunately, this won't work properly as these lines of code:

treeFile <- system.file("extdata", "example.tree", package = "homoplasyFinder") fastaFile <- system.file("extdata", "example.fasta", package = "homoplasyFinder")

are finding specific example files that I have generated and attached to the homoplasyFinder package. I know it is a pain but would you be able to re-run the code I provided without any changes and send me any output that you get? In the meantime, I will have a look into what might be causing a segmentation fault as I have never seen that before! Thanks for working on this issue with me! Joe

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub(https://github.com/JosephCrispell/homoplasyFinder/issues/8?email_source=notifications&email_token=AMWSLAEV2IEQD7VCX5CVHXTQV7OBVA5CNFSM4JQ3FED2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFM4CIY#issuecomment-559530275), or unsubscribe(https://github.com/notifications/unsubscribe-auth/AMWSLAEJPQ7YI4S35XB4O5TQV7OBVANCNFSM4JQ3FEDQ).

JosephCrispell commented 4 years ago

Hi,

Ah ok, then homoplasyFinder is working on your computer. Therefore it is a problem that is specific to your dataset.

Could you try restarting your Rstudio and running:

options(java.parameters = "-Xmx4000m")

this will assign your computer a bit more RAM. Now try running your dataset using the runHomoplasyFinderInJava() with your FASTA and tree files.

Also could you tell me the the number of sequences (and their length) in your FASTA file?

Thanks,

Joe

tanu121093 commented 4 years ago

Hi, thank you so much for instant replies. I tried the command but again facing the same error. My session aborts abruptly. I am sharing the workplan which i followed to develop the input files- I used spades/clc for assembly of sequence data then I used prokka for generating .gff files, which are used as input for roary. the output file of roary (.aln file) converted to .fasta (by renaming it) and used as an input file for homoplasyFinder tool. the .aln file was converted to .phy using seaview, further .phy file was used for phyML to generate .nwk file. the .nwk file was converted to .tree(by renaming it). this .tree file is used as input for homoplasyfinder tool.

i have fasta file of total size 44,675kb and 30 genomes.

tanu121093 commented 4 years ago

hi, i tried running tool with 5 genomes of acinetobacter( each 5MB in size). It is running successfully. but when i am taking a larger dataset (30 genomes, 45MB total file size), it shows error. Is this tool just for smaller dataset?

tanu121093 commented 4 years ago

I could run the runHomoplasyFinderInJava() with my FASTA and tree files. Again, when I read annotated tree by >tree <- readAnnotatedTree(workingDirectory), I am facing same error( r encountered a fatal error. session terminates)

JosephCrispell commented 4 years ago

Hi,

That is great that you can get the smaller dataset to run!

The problem is therefore the memory required to run homoplasyFinder. How much RAM do you have on your computer?

If you have enough RAM then homoplasyFinder should have no problem analysing a larger dataset. You need to restart your Rstudio. Then put this command at the top of your script and run it before running any other commands:

# Assign 10GB of RAM to JAVA
options(java.parameters = "-Xmx10g")

This will assign 10GB of RAM to Java (which homoplasyFinder uses (Note this assumes your computer has >10GB available). Try running your dataset with runHomoplasyFinderInJava() and let me know how you get on.

Good luck and thanks for sticking with it!

Joe

tanu121093 commented 4 years ago

Hi,

Thank you for the response.

i have a workstation with Intel(R) Xeon (R) Gold 5218 CPU ( 16core(16*2)-dual processor), RAM =64GB, 64 bit operating system, windows10 pro.

I tried using the command

Assign 10GB of RAM to JAVA

options(java.parameters = "-Xmx10g") , i also assigned 32gb Ram to JAVA but again getting same error (R encountered a fatal error. session terminated.) This error occurs when I #read annotated tree. tree <- readAnnotatedTree(workingDirectory).

In the result file, i am getting 50000 observations. kindly suggest some other option.

tanu121093 commented 4 years ago

i have 63 Gb Ram usable in system.

JosephCrispell commented 4 years ago

Hi,

Just to be clear the runHomoplasyFinderInJava() is now running without any error?

Are the following three files appearing in your working directory?:

sequences_noInconsistentSites_02-12-19.fasta
annotatedNewickTree_02-12-19.tree
consistencyIndexReport_02-12-19.txt

Note that by default, the readAnnotatedTree() function looks for the annotated Newick tree file with today's date

Would you be able to share your tree and fasta file with me so I can try and get them to work on my computer? I will keep the data confidential and remove them once I am finished.

Joe

tanu121093 commented 4 years ago

Hi, I could generate all three output files without error. The files are appearing in my working directory. I have mailed you the files on Gmail. kindly revert back soon. The only problem I am facing is in reading the annotated out[ut file and plotting the tree in RStudio. Also, let me know some other system requirements if it needs.

Also, tell me if the tool is not compatible with the large dataset as I have spent more than a week working on this. Otherwise, I will look for other options.

JosephCrispell commented 4 years ago

Hi,

If you can produce those three files without error then homoplasyFinder has worked on your dataset.

The readAnnotatedTree() function is simply using ape's read.tree() function to read in the annotatedNewickTree_02-12-19.tree. Could you try running on this file:

 tree <- ape::read.tree("annotatedNewickTree_02-12-19.tree")

I haven't received your files on gmail (crispelljoseph@gmail.com), when I do I will run them immediately and hopefully be able to get the readAnnotatedTree() function to work.

Thanks for your patience!

Joe

JosephCrispell commented 4 years ago

Hi,

I have received your tree and fasta files. I ran them with homoplasyFinder and I am having the same problem as you are with the readAnnotatedTree() function.

This error is because there are 50,000 homoplasies in your alignment and the read.tree() function in the ape can't read a phylogeny with that number of annotations.

I don't have an immediate solution to this problem. However, the main results are in the consistencyIndexReport_02-12-19.txt file, which reports the position of each homoplasy and its consistency with the phylogeny. In addition, the plotAnnotatedTree() function in homoplasyFinder will not work a phylogeny with 50,000 homoplasies.

So whilst homoplasyFinder can analyse your large dataset and summarise the results it cannot plot the results.

I hope you find the consistencyIndexReport_02-12-19.txt useful and I will have a think about different ways to visualise the results without a phylogeny. You can read the results (consistencyIndexReport_02-12-19.txt) with the following code:

reportFile <- file.path(workingDirectory, "consistencyIndexReport_02-12-19.txt")
reportTable <- read.table(reportFile, header=TRUE, sep="\t", stringsAsFactors=FALSE)

Sorry the plotting of the annotated tree won't work for your big data!

Joe

tanu121093 commented 4 years ago

Thank you so much for helping out. I wish you could have told that before when I mentioned the result has 50,000 observations. Nevertheless, thank you for your reply. Kindly add that feature to analyze big data homoplasies that will be helpful for a larger community. Suggest some tools, if any, to visualize the results in the form of tree in Rstudio.

cls83 commented 4 years ago

Hi,

I'm having a similar issue, is 2303 observations too many? I can run the program, but when I try to plot the tree it crashes. Have tried adding more memory (10G). Log file is attached. hs_err_pid48883.log

Screen Shot 2020-03-15 at 1 47 30 PM

JosephCrispell commented 4 years ago

Hi,

Thanks for giving HomoplasyFinder a go! Yes for visualising 2303 observations is too many. It is trying to make a heatmap - so with 2303 * the number of sequences, that will be too much for R to handle.

It shouldn't crash though, it should just hang. The crashing is an underlying problem with rJava I think (see this issue for more details).

To stop the crashing from happening use the runHomoplasyFinderJarTool() instead of the runHomoplasyFinderInJava() function as runHomoplasyFinderJarTool() does not use the rJava package. Unfortunately, that still won't help you with the visualising.

My advice is to examine the positions where homoplasies occur (identified by homoplasyFinder) and see whether a subset of them are interesting and you can learn why there might be the same mutation occurring independently multiple times. Thinking about the protein coding sequences and whether each mutation could be synonymous or non-synonymous.

Thanks for using HomoplasyFinder and let me know how you get on.

Joe