Closed vujex closed 4 years ago
Hi,
Thanks so much for giving homoplasyFinder
a go. Sorry for not getting back to you sooner, I completely missed the alert letting me know about your issue.
I've added a new function (assignSiteAlleles()
) that assigns the sample IDs (from the fasta
file) to each allele found at each position reported by homoplasyFinder
.
You can test the function with the following code:
# Remove the old version of homoplasyFinder
detach("package:homoplasyFinder")
remove.packages("homoplasyFinder")
# Re-install homoplasyFinder
devtools::install_github("JosephCrispell/homoplasyFinder")
# Load homoplasyFinder
library(homoplasyFinder)
# Find the FASTA and tree files attached to package
fastaFile <- system.file("extdata", "example.fasta", package = "homoplasyFinder")
treeFile <- system.file("extdata", "example.tree", package = "homoplasyFinder")
# Get the current working directory
workingDirectory <- paste0(getwd(), "/")
# Run the HomoplasyFinder java code
inconsistentPositions <- runHomoplasyFinderInJava(treeFile, fastaFile, path=workingDirectory)
# Get the current date
date <- format(Sys.Date(), "%d-%m-%y")
# Read in the output table
resultsFile <- paste0(workingDirectory, "consistencyIndexReport_", date, ".txt")
results <- read.table(resultsFile, header=TRUE, sep="\t", stringsAsFactors=FALSE)
# Get the site information
siteInfo <- assignSiteAlleles(results, fastaFile)
The siteInfo
list stores the following information for each site: (position <- "57"
)
siteInfo[[position]]$Position
- numeric position in fasta sequence alignmentsiteInfo[[position]]$ConsistencyIndex
- consistency index calculated by homoplasyFinder
siteInfo[[position]]$MinimumNumberChangesOnTree
- the minimum number of changes required on phylogeny to explain patterns observed at positionsiteInfo[[position]]$Counts
- a list recording the number of samples found with each allele at the current position:
$Counts$A
[1] 0
$Counts$C [1] 0
$Counts$G [1] 139
$Counts$T [1] 16
- `siteInfo[[position]]$Samples` - a list recording the samples with each allele at the current position:
$Samples$A character(0)
$Samples$C character(0)
$Samples$G [1] "128" "140" "76" "169" "190" "24" "295" "291" "105" "1" "193" "254" "289" "171" "211" "4" "37" "23" "17" "116" [21] "133" "296" "8" "36" "61" "69" "110" "147" "179" "299" "43" "203" "206" "208" "279" "294" "39" "79" "298" "81" [41] "124" "225" "101" "120" "146" "155" "156" "199" "60" "219" "250" "269" "285" "13" "29" "117" "188" "88" "93" "233" [61] "242" "31" "78" "83" "102" "114" "132" "26" "44" "52" "98" "158" "191" "213" "265" "276" "15" "42" "65" "201" [81] "221" "222" "273" "53" "72" "99" "112" "118" "126" "139" "157" "178" "198" "217" "300" "5" "22" "104" "111" "130" [101] "165" "172" "181" "209" "241" "282" "286" "25" "113" "144" "18" "38" "73" "163" "170" "262" "48" "153" "202" "74" [121] "192" "131" "141" "177" "197" "260" "152" "216" "232" "284" "97" "108" "275" "135" "82" "224" "253" "47" "89"
$Samples$T [1] "41" "271" "161" "183" "19" "278" "27" "58" "57" "84" "266" "187" "166" "55" "220" "160"
I hope the new `assignSiteAlleles()` functions works well and gets you the information you need.
Great, thanks for adding the "assignSiteAlleles" function. I just tested it and it works perfectly on my data!
Ah that's great, sorry it took me so long to see your issue and thanks again for using homoplasyFinder
😃
Hi!
I have a question about using HomoplasyFinder on a "larger" dataset. I'm trying to use it on a bacterial phylogenetic tree (900 samples/terminal nodes) containing several lineages. My goal is to see whether there are homoplastic events occurring at the same genomic location in different lineages. And I hope your tool will be useful for that!
I managed to get the right output from homoplasyFinder (consistency index report with in my case approximately 500 positions). Because of the numerous positions, using R to visualise this doesn't really look good.
So, my question is:
Is there is a way to see exactly in which samples the homoplasies are happening in my data (because it's too much to visualise in R)?
And/or
Is it possible to get a file that tells us for each position in which (terminal) nodes we get a A,C,T or G?
Apologies if it's a dumb question...
Thanks a lot!
Regards,