Gene annotation will allow us to tokenize the mitochondria sequences smarter. We will do this on the small (16,000 nt) mitochondria DNA first. This will lay the groundwork for annotating all expressed mRNA genes and even later in the project the whole genome.
In slides in overview/wiki, see Science #1 slides for the difference between mitochondria and nuclear DNA and (on slide 4) an image of the 13 proteins, 22 tRNA, 2 rRNA, and D-loop whose coordinates we want to annotate.
Method
The software Annovar was recommended as a good starting place by Kerrigan Blake, a PhD student in the Lawson Lab.
You could also search google with phrases like "annotate genes software packages free" or "SciPy gene annotation" to find what package would be best.
Data
Nine 19KB files are here:
deepcelllineage/data01/gen/nguyen_nc_2018/20190710-320lfastas/
(data01 is a separate repo within deepcelllineage)
These are .fasta files that are the full length of the mitochondrial gene plus some of chr1. There is a separate issue to chop off the chr1 part.
I used for f in *; do grep -n ">2" $f; done to check that each goes into chr1.
Document your work
Please fork & clone this repo. Check out a branch for your work, then push and make a PR for us to merge your note and files.
Add a note (can be .md or .ipynb) with your solution to mitolin/nb.
Your note should be named as follows:
DATE-issue#-shortdescription.ext
e.g.:
20190701-i02-extract-chrM-fa.md
Questions?
Please put questions related to this issue in this issue thread.
If you want a quick response, post a link to your comment in this thread to Slack #deepcelllineage and tag @Deena or others. You can also DM @Deena.
To join Slack enter your email address here. For questions NOT specifically related to this issue, get in touch through any of the communication methods listed in DCL's overview README.
Aim
Annotate the gene features of mitochondrial DNA.
Background
Gene annotation will allow us to tokenize the mitochondria sequences smarter. We will do this on the small (16,000 nt) mitochondria DNA first. This will lay the groundwork for annotating all expressed mRNA genes and even later in the project the whole genome.
In slides in overview/wiki, see Science #1 slides for the difference between mitochondria and nuclear DNA and (on slide 4) an image of the 13 proteins, 22 tRNA, 2 rRNA, and D-loop whose coordinates we want to annotate.
Method
The software Annovar was recommended as a good starting place by Kerrigan Blake, a PhD student in the Lawson Lab.
You could also search google with phrases like "annotate genes software packages free" or "SciPy gene annotation" to find what package would be best.
Data
Nine 19KB files are here: deepcelllineage/data01/gen/nguyen_nc_2018/20190710-320lfastas/ (data01 is a separate repo within deepcelllineage)
A link to the nine files is here: https://github.com/deepcelllineage/data01/tree/master/gen/nguyen_nc_2018/20190710-320lfastas
These are .fasta files that are the full length of the mitochondrial gene plus some of chr1. There is a separate issue to chop off the chr1 part.
I used
for f in *; do grep -n ">2" $f; done
to check that each goes into chr1.Document your work
Please fork & clone this repo. Check out a branch for your work, then push and make a PR for us to merge your note and files.
Add a note (can be .md or .ipynb) with your solution to mitolin/nb.
Your note should be named as follows:
e.g.:
Questions?
Please put questions related to this issue in this issue thread.
If you want a quick response, post a link to your comment in this thread to Slack #deepcelllineage and tag @Deena or others. You can also DM @Deena.
To join Slack enter your email address here. For questions NOT specifically related to this issue, get in touch through any of the communication methods listed in DCL's overview README.