JosephCrispell / homoplasyFinder

A tool to identify and annotate homoplasies on a phylogeny and sequence alignment
GNU General Public License v3.0
19 stars 3 forks source link

Error in .jcheck() : AttachCurrentThread failed! (result:-1) #17

Closed WangYue0926 closed 3 years ago

WangYue0926 commented 3 years ago

Hi,

I got this error When I run the HomoplasyFinder jar tool. Could you help with this? Thank you very much!

JosephCrispell commented 3 years ago

Hi,

Thanks for using homoplasyFinder could you provide a little more information about the problem you are having so we can figure what's going wrong. Would you be able to answer the following questions:

WangYue0926 commented 3 years ago

Hi, Thank you so much for your reply. I found this problem was caused due to wrong fasta file format.

I provided the ML trees construed by RAxML and the multi-sequence alignment fasta files. I have 2 WGS datasets of a fungal species, one with 28 samples (3681 sites) and the other with 123 samples (8,379 sites). Freebayes was used to call SNPs, and the criteria for the filter are 'DP > 10 & MQ > 30 & QD > 20'

I used R to run the program.

I can only generate the report files, and when I tried to generate the phylogenetic trees, R crashed. For the two datasets, I got 315 and 1,278 SNPs with CI lower than 1, respectively. I thought the R crashed because the results are too large.

Now, I am concerned about the results. The homoplasy rates seem very high in my samples. If I do not filter the variant calling results, 4,750 and 16,248 SNP sites are detected for the two datasets. And there are 615 and 4,337 homoplasy sites in them. The homoplasy rates are around 10%. Is this normal?

JosephCrispell commented 3 years ago

Hi,

Just to check - the original problem has been resolved. It was caused by an incorrect FASTA file format?

With regards to R crashing, this is a known issue that I haven't had time to solve but hopefully will soon. In the meantime, you should find that homoplasyFinder will work nicely in the command line or the runHomoplasyFinderJarTool() in R will run homoplasyFinder without using rJava and should mean R doesn't crash.

Note that by default both the runHomoplasyFinderInJava() and runHomoplasyFinderJarTool() functions in R will generate the annotated phylogenetic tree (createAnnotatedNewickTree=TRUE) - check for the file in the path that you provided and the tree file should be there and it can be opened in any phylogenetic tree viewing software.

Lastly on your homoplasy rates - the amount of homoplasies identified in a phylogeny depends on the organism - if it has a high substitution rate and small genome and/or high levels of recombination then more homoplasies will be present. Generally 10% isn't too many and I wouldn't worry too much about their impact on their analyses. If you are wanting to remove them then both the above functions by default (createFasta=TRUE) will also create a FASTA file without the inconsistent sites that can be used in further analyses.

WangYue0926 commented 3 years ago

Hi,

Thank you for your quick reply.

Yes, the problem is caused by the fasta file format. I forgot to add the sample number and SNP number in the first line of the FASTA file. And the separator for the two values is a space, not a tab.

Both runHomoplasyFinderInJava() and runHomoplasyFinderJarTool() worked fine in my R studio. However, when I ran this command tree <- readAnnotatedTree(workingDirectory), the R session aborted. I attached the error report for you to refer to.

For the species I tested, the substitution rate is around 5e-5, and the genome size is around 12.7Mb. For now, limited mating events have been observed. Does the homoplasy rate indicate that there are high levels of recombination in this species? And the mating events might occur within this species? Another question I have is about the Homoplasy Index (HI). I used 1- CI as HI and found many sites with high HI, e.g. HI>9, have high MinimumNumberChangesOnTree. Are these sites trustworthy? It seems that these sites did not contribute too much to the topology of the phylogeny. Thanks~

hs_err_pid3485.log

JosephCrispell commented 3 years ago

Hi,

The readAnnotatedTree(workingDirectory) is a very simple function - here's the code:

readAnnotatedTree <- function(path, date=format(Sys.Date(), "%d-%m-%y")){
  return(ape::read.tree(paste0(path, "annotatedNewickTree_", date, ".tree")))
}

The crashing issue is likely to be caused by rJava hanging after running homoplasyFinder. You should be able to read the annotated tree in by closing RStudio/R and starting a new session. Then note the directory where your tree and FASTA files are and run readAnnotatedTree() with something like the following code:

workingDirectory <- file.path("path", "to", "DirectoryWithFASTAAndTreeFiles")
tree <- readAnnotatedTree(workingDirectory, date="18-11-20")

Note that if you have successfully run either of the runHomoplasyFinderInJava() and runHomoplasyFinderJarTool() functions then you don't need to run them again. You can just provide the date the analyses completed to read in the annotated tree file. You can read in the results file with:

resultsFile <- paste0(workingDirectory, "consistencyIndexReport_18-11-20.txt")
results <- read.table(resultsFile, header=TRUE, sep="\t", stringsAsFactors=FALSE)

The homoplasy rate indicates how many sites are inconsistent with the phylogenetic tree. Unfortunately it doesn't provide any indication of the cause. The cause can be sequencing errors, non-robust phylogeny construction, recombination, evolutionary convergence, high mutation rates, and many more. The best thing to do is examine the characteristics of the sites that are inconsistent and see what you can learn.

Similar to above the consistency index doesn't tell you much about how robust a site is - you are best looking at the sequencing data and where the sites appear on the genome to answer that. That's great that the inconsistent sites don't seem to affect the phylogeny - that means there must be a strong phylogenetic signal in the remaining sites and that's important if you are using the phylogeny in further analyses.

WangYue0926 commented 3 years ago

Your reply helps a lot. Thank you very much!