Closed nathandunn closed 8 years ago
Basically, for each organism, I want a track of genes, and the conservation of the genome, if available. Easy peasy. For human and mouse, there’s some extra stuff, which are regulatory tracks (that’s for another project we’re getting started). For testing purposes, you could just start with one regulatory/enhancer track (rather than the 10-20), and we’ll see how it looks.
Then, we should investigate if it is better for us to host features to show on one or more additional track(s), or if we should dump them en masse for a jbrowse server. I am not sure what the right workflow should be for the best performance.
These species are core, and in order of importance:
Human (we have this data)
Mouse
Zebrafish
Drosophila
C. elegans
Others that would be nice to have, loosely in order of preference:
Dog (3 vertebrates):
Pig
Cow (compare against other cow):
Chicken
Sheep:
Horse
Cat (3 vertebrates):
all data downloaded
Lemme know if you have any questions about loading the datasets from ensembl/refseq/ucsc... there are some tricks sometimes
Most of it I have.... I have to rewrite some of the FASTA files to create better chromosome names.
My bigger question is... what is the preferred method for importing the GFF3 gene tracks? I always forget the type or if its necessary
For example bin/flatfile-to-json.pl --gff /data/jbrowse/monarch/cow2/raw/Bos_taurus.UMD3.1.81.gff3 --out /data/jbrowse/monarch/cow2 --trackLabel Cow2
You can see these look pretty horrible . . I can retry with —-type=gene, mRNA, transcript . . etc. etc. and I also need to exclude the chromosome:
http://icebox.lbl.gov/Apollo2/jbrowse/index.html?loc=1:151964931..158337067&organism=158569&tracks=Cow2 http://icebox.lbl.gov/Apollo2/jbrowse/index.html?loc=1:151964931..158337067&organism=158569&tracks=Cow2
Anyway, any pointers would be great,
Nathan
On Sep 15, 2015, at 11:58 AM, Colin Diesh notifications@github.com wrote:
Lemme know if you have any questions about loading the datasets from ensembl/refseq/ucsc... there are some tricks somtimes
— Reply to this email directly or view it on GitHub https://github.com/GMOD/Apollo/issues/568#issuecomment-140501012.
The ensembl GFF3 normally use "transcript" instead of "mRNA" for their transcript types, so just pass that to the --type argument.
bin/flatfile-to-json.pl --type transcript --gff "sorted gff file" --out /opt/apollo/organism --trackLabel Ensembl_transcripts
There are also some cool extra filters that you can add to the --type argument too
For example, you can also load multiple types into one track
bin/flatfile-to-json.pl --type transcript,mRNA --gff "sorted gff file" --out /opt/apollo/organism --trackLabel Ensembl_transcripts_and_mRNA
That would load both "transcript" and "mRNA" from column 3
You can also filter on column 2 (source) and column 3 (type) simultaneously
bin/flatfile-to-json.pl --type transcript:ensembl_havana --gff "sorted gff file" --out /opt/apollo/organism --trackLabel Ensembl_havana_transcripts
That would only load the havana sourced transcripts into the track
Ha, so they were all right! I think I also inherited some of the Apollo trackList decoration, as well.
Nathan
On Sep 15, 2015, at 12:15 PM, Colin Diesh notifications@github.com wrote:
The ensembl GFF3 normally use "transcript" instead of "mRNA" for their transcript types, so just pass that to the --type argument.
bin/flatfile-to-json.pl --type transcript --gff "sorted gff file" --out /opt/apollo/organism --trackLabel Ensembl_transcripts There are also some cool extra filters that you can add to the --type argument too
For example, you can also load multiple types into one track
bin/flatfile-to-json.pl --type transcript,mRNA --gff "sorted gff file" --out /opt/apollo/organism --trackLabel Ensembl_transcripts_and_mRNA That would load both "transcript" and "mRNA" from column 3
You can also filter on column 2 (source) and column 3 (type) simultaneously
bin/flatfile-to-json.pl --type transcript:ensembl_havana --gff "sorted gff file" --out /opt/apollo/organism --trackLabel Ensembl_havana_transcripts That would only load the havana sourced transcripts into the track
— Reply to this email directly or view it on GitHub https://github.com/GMOD/Apollo/issues/568#issuecomment-140506023.
tagging myself @nlwashington
Discussion with @nlwashington, looked at human and zebrafish. Changes added to todo-list.
@nlwashington . .
I think I want to create a mildly separate build off of 2.0 / master branch that consists of:
But without being dependent on 2.1. Should be pretty quick to get up and running . . can ssh over existing data.
This is in support of the monarch project, but will also test the system and provide some good homologous test data.