GMOD / Apollo

Genome annotation editor with a Java Server backend and a Javascript client that runs in a web browser as a JBrowse plugin.
http://genomearchitect.readthedocs.io/
Other
128 stars 85 forks source link

add more species to icebox server #568

Closed nathandunn closed 8 years ago

nathandunn commented 9 years ago

This is in support of the monarch project, but will also test the system and provide some good homologous test data.

nathandunn commented 9 years ago

Basically, for each organism, I want a track of genes, and the conservation of the genome, if available. Easy peasy. For human and mouse, there’s some extra stuff, which are regulatory tracks (that’s for another project we’re getting started). For testing purposes, you could just start with one regulatory/enhancer track (rather than the 10-20), and we’ll see how it looks.

Then, we should investigate if it is better for us to host features to show on one or more additional track(s), or if we should dump them en masse for a jbrowse server. I am not sure what the right workflow should be for the best performance.

These species are core, and in order of importance:

Human (we have this data)

Mouse

Zebrafish

Drosophila

C. elegans

Others that would be nice to have, loosely in order of preference:

Dog (3 vertebrates):

Pig

Cow (compare against other cow):

Chicken

Sheep:

Horse

Cat (3 vertebrates):

nathandunn commented 9 years ago

all data downloaded

cmdcolin commented 9 years ago

Lemme know if you have any questions about loading the datasets from ensembl/refseq/ucsc... there are some tricks sometimes

nathandunn commented 9 years ago

Most of it I have.... I have to rewrite some of the FASTA files to create better chromosome names.

My bigger question is... what is the preferred method for importing the GFF3 gene tracks? I always forget the type or if its necessary

For example bin/flatfile-to-json.pl --gff /data/jbrowse/monarch/cow2/raw/Bos_taurus.UMD3.1.81.gff3 --out /data/jbrowse/monarch/cow2 --trackLabel Cow2

You can see these look pretty horrible . . I can retry with —-type=gene, mRNA, transcript . . etc. etc. and I also need to exclude the chromosome:

http://icebox.lbl.gov/Apollo2/jbrowse/index.html?loc=1:151964931..158337067&organism=158569&tracks=Cow2 http://icebox.lbl.gov/Apollo2/jbrowse/index.html?loc=1:151964931..158337067&organism=158569&tracks=Cow2

Anyway, any pointers would be great,

Nathan

On Sep 15, 2015, at 11:58 AM, Colin Diesh notifications@github.com wrote:

Lemme know if you have any questions about loading the datasets from ensembl/refseq/ucsc... there are some tricks somtimes

— Reply to this email directly or view it on GitHub https://github.com/GMOD/Apollo/issues/568#issuecomment-140501012.

cmdcolin commented 9 years ago

The ensembl GFF3 normally use "transcript" instead of "mRNA" for their transcript types, so just pass that to the --type argument.

bin/flatfile-to-json.pl --type transcript --gff "sorted gff file" --out /opt/apollo/organism --trackLabel Ensembl_transcripts

There are also some cool extra filters that you can add to the --type argument too

For example, you can also load multiple types into one track

 bin/flatfile-to-json.pl --type transcript,mRNA --gff "sorted gff file" --out /opt/apollo/organism --trackLabel Ensembl_transcripts_and_mRNA

That would load both "transcript" and "mRNA" from column 3

You can also filter on column 2 (source) and column 3 (type) simultaneously

 bin/flatfile-to-json.pl --type transcript:ensembl_havana --gff "sorted gff file" --out /opt/apollo/organism --trackLabel Ensembl_havana_transcripts

That would only load the havana sourced transcripts into the track

nathandunn commented 9 years ago

Ha, so they were all right! I think I also inherited some of the Apollo trackList decoration, as well.

Nathan

On Sep 15, 2015, at 12:15 PM, Colin Diesh notifications@github.com wrote:

The ensembl GFF3 normally use "transcript" instead of "mRNA" for their transcript types, so just pass that to the --type argument.

bin/flatfile-to-json.pl --type transcript --gff "sorted gff file" --out /opt/apollo/organism --trackLabel Ensembl_transcripts There are also some cool extra filters that you can add to the --type argument too

For example, you can also load multiple types into one track

bin/flatfile-to-json.pl --type transcript,mRNA --gff "sorted gff file" --out /opt/apollo/organism --trackLabel Ensembl_transcripts_and_mRNA That would load both "transcript" and "mRNA" from column 3

You can also filter on column 2 (source) and column 3 (type) simultaneously

bin/flatfile-to-json.pl --type transcript:ensembl_havana --gff "sorted gff file" --out /opt/apollo/organism --trackLabel Ensembl_havana_transcripts That would only load the havana sourced transcripts into the track

— Reply to this email directly or view it on GitHub https://github.com/GMOD/Apollo/issues/568#issuecomment-140506023.

nlwashington commented 9 years ago

tagging myself @nlwashington

nathandunn commented 9 years ago

Discussion with @nlwashington, looked at human and zebrafish. Changes added to todo-list.

nathandunn commented 9 years ago

@nlwashington . .

I think I want to create a mildly separate build off of 2.0 / master branch that consists of:

But without being dependent on 2.1. Should be pretty quick to get up and running . . can ssh over existing data.