deepcelllineage / mitolin

Use mitochondrial sequence from single cells to determine cell lineage relationships
BSD 3-Clause "New" or "Revised" License
2 stars 4 forks source link

Annotate mitochondrial DNA fasta files with genes #8

Open Deena-B opened 5 years ago

Deena-B commented 5 years ago

Aim

Annotate the gene features of mitochondrial DNA.

Background

Gene annotation will allow us to tokenize the mitochondria sequences smarter. We will do this on the small (16,000 nt) mitochondria DNA first. This will lay the groundwork for annotating all expressed mRNA genes and even later in the project the whole genome.

In slides in overview/wiki, see Science #1 slides for the difference between mitochondria and nuclear DNA and (on slide 4) an image of the 13 proteins, 22 tRNA, 2 rRNA, and D-loop whose coordinates we want to annotate.

Method

The software Annovar was recommended as a good starting place by Kerrigan Blake, a PhD student in the Lawson Lab.

You could also search google with phrases like "annotate genes software packages free" or "SciPy gene annotation" to find what package would be best.

Data

Nine 19KB files are here: deepcelllineage/data01/gen/nguyen_nc_2018/20190710-320lfastas/ (data01 is a separate repo within deepcelllineage)

A link to the nine files is here: https://github.com/deepcelllineage/data01/tree/master/gen/nguyen_nc_2018/20190710-320lfastas

These are .fasta files that are the full length of the mitochondrial gene plus some of chr1. There is a separate issue to chop off the chr1 part.

I used for f in *; do grep -n ">2" $f; done to check that each goes into chr1.

Document your work

Please fork & clone this repo. Check out a branch for your work, then push and make a PR for us to merge your note and files.

Add a note (can be .md or .ipynb) with your solution to mitolin/nb.

Your note should be named as follows:

e.g.:

Questions?

Please put questions related to this issue in this issue thread.

If you want a quick response, post a link to your comment in this thread to Slack #deepcelllineage and tag @Deena or others. You can also DM @Deena.

To join Slack enter your email address here. For questions NOT specifically related to this issue, get in touch through any of the communication methods listed in DCL's overview README.