JosephCrispell / homoplasyFinder

A tool to identify and annotate homoplasies on a phylogeny and sequence alignment
GNU General Public License v3.0
19 stars 3 forks source link

Question about homoplasyFinder output #12

Closed vujex closed 4 years ago

vujex commented 4 years ago

Hi!

I have a question about using HomoplasyFinder on a "larger" dataset. I'm trying to use it on a bacterial phylogenetic tree (900 samples/terminal nodes) containing several lineages. My goal is to see whether there are homoplastic events occurring at the same genomic location in different lineages. And I hope your tool will be useful for that!

I managed to get the right output from homoplasyFinder (consistency index report with in my case approximately 500 positions). Because of the numerous positions, using R to visualise this doesn't really look good.

So, my question is:

Is there is a way to see exactly in which samples the homoplasies are happening in my data (because it's too much to visualise in R)?

And/or

Is it possible to get a file that tells us for each position in which (terminal) nodes we get a A,C,T or G?

Apologies if it's a dumb question...

Thanks a lot!

Regards,

JosephCrispell commented 4 years ago

Hi,

Thanks so much for giving homoplasyFinder a go. Sorry for not getting back to you sooner, I completely missed the alert letting me know about your issue.

I've added a new function (assignSiteAlleles()) that assigns the sample IDs (from the fasta file) to each allele found at each position reported by homoplasyFinder.

You can test the function with the following code:

# Remove the old version of homoplasyFinder
detach("package:homoplasyFinder")
remove.packages("homoplasyFinder")

# Re-install homoplasyFinder
devtools::install_github("JosephCrispell/homoplasyFinder")

# Load homoplasyFinder
library(homoplasyFinder)

# Find the FASTA and tree files attached to package
fastaFile <- system.file("extdata", "example.fasta", package = "homoplasyFinder")
treeFile <- system.file("extdata", "example.tree", package = "homoplasyFinder")

# Get the current working directory
workingDirectory <- paste0(getwd(), "/")

# Run the HomoplasyFinder java code
inconsistentPositions <- runHomoplasyFinderInJava(treeFile, fastaFile, path=workingDirectory)

# Get the current date
date <- format(Sys.Date(), "%d-%m-%y")

# Read in the output table
resultsFile <- paste0(workingDirectory, "consistencyIndexReport_", date, ".txt")
results <- read.table(resultsFile, header=TRUE, sep="\t", stringsAsFactors=FALSE)

# Get the site information
siteInfo <- assignSiteAlleles(results, fastaFile)

The siteInfo list stores the following information for each site: (position <- "57")

$Counts$C [1] 0

$Counts$G [1] 139

$Counts$T [1] 16

- `siteInfo[[position]]$Samples` - a list recording the samples with each allele at the current position:

$Samples$A character(0)

$Samples$C character(0)

$Samples$G [1] "128" "140" "76" "169" "190" "24" "295" "291" "105" "1" "193" "254" "289" "171" "211" "4" "37" "23" "17" "116" [21] "133" "296" "8" "36" "61" "69" "110" "147" "179" "299" "43" "203" "206" "208" "279" "294" "39" "79" "298" "81" [41] "124" "225" "101" "120" "146" "155" "156" "199" "60" "219" "250" "269" "285" "13" "29" "117" "188" "88" "93" "233" [61] "242" "31" "78" "83" "102" "114" "132" "26" "44" "52" "98" "158" "191" "213" "265" "276" "15" "42" "65" "201" [81] "221" "222" "273" "53" "72" "99" "112" "118" "126" "139" "157" "178" "198" "217" "300" "5" "22" "104" "111" "130" [101] "165" "172" "181" "209" "241" "282" "286" "25" "113" "144" "18" "38" "73" "163" "170" "262" "48" "153" "202" "74" [121] "192" "131" "141" "177" "197" "260" "152" "216" "232" "284" "97" "108" "275" "135" "82" "224" "253" "47" "89"

$Samples$T [1] "41" "271" "161" "183" "19" "278" "27" "58" "57" "84" "266" "187" "166" "55" "220" "160"



I hope the new `assignSiteAlleles()` functions works well and gets you the information you need.
vujex commented 4 years ago

Great, thanks for adding the "assignSiteAlleles" function. I just tested it and it works perfectly on my data!

JosephCrispell commented 4 years ago

Ah that's great, sorry it took me so long to see your issue and thanks again for using homoplasyFinder 😃