Open iamciera opened 4 years ago
## For switching out the species names in other files
species_key <- read.csv("../data/montium_species_laneID.csv")
species <- species_key$species
lane <- species_key$lane_ID
## replace all the lane IDs with species names
for(j in seq_along(lane)){
dataset$species <- gsub(lane[j], species[j], dataset$species)
}
Hi Ciera the ID's you gave me don't match the tree names. The formatting of the species names are different and some names are missing from both tree files. For example, D. Baimaii is in the dictionary of names but not in any of the tree files.
The vignette I linked uses a genepred file of gene annotations to get their neutral model and I was wondering if we needed this file
December 13, 2019
phastweb: GUI that essentially mimics script and command line workflow.
Link: https://github.com/DiscoveryDNA/montium_5_TFBS_evolution/blob/master/R/phastcons.Rmd
Goal: Learn how to use phastcons for scoring conservation.
Conservation is basically a score that estimates how similar the sequences are. Phastcons was the only package that incorporated a tree to accomplish this, so I think we should start here and possibly down the line, implement our own algorithm. But first, we need to understand what their program does. The best way to accomplish this is to approach this is go through tutorials online, apply our data to it, and read papers that use it.
We want to look at "conservation score" across 1.) whole alignment files and 2.) specific motifs 3.) Learn the best way to visualize scores
1. Whole Alignment Files
We want to accomplish conservation on the entire alignment file, as a way to normalize the dataset and get a global view of each region. For example, ask questions like "Are the enhancers with positive function (
enhancer_func == 1
) more conserved that those that are not (enhancer_func == 0
)?The alignment files are located here: https://drive.google.com/open?id=1UEXg0QMDFKIrvwnTxo64t2AWseYOCfD9
The trees is located here: https://github.com/DiscoveryDNA/montium_5_TFBS_evolution/tree/master/data/tree.
The best tutorial I found is below, but feel free to look around. I will add more on this issue.
2. Single Motifs
I am not sure the best way to approach this, it could be the same as doing the whole alignment, but I will have to look into it more. Please fee free to explore this yourself also.
3. Visualize
At this point the best way to accomplish this is to play around with the data while doing the tutorial, but make sure you are paying attention and making notes on possible options while doing your work.
Other resources
Please use comments below to add resources, ask questions, and provide comments relating to conservation.