GMOD / jbrowse

JBrowse 1, a full-featured genome browser built with JavaScript and HTML5. For JBrowse 2, see https://github.com/GMOD/jbrowse-components.
http://jbrowse.org
Other
464 stars 199 forks source link

Create pseudogene glyph #1106

Closed cmdcolin closed 6 years ago

cmdcolin commented 6 years ago

The issue discussed in #1075 highlighted that representing pseudogenes is a little tricky

Especially with a track that has both genes and pseudogenes, it would be good to dispatch to the Gene glyph for gene feature types and a Psuedogene glyph for pseudogene features in CanvasFeatures, because pseudogenes don't share the same structure of having CDS for example so their structure is not captured well

cmdcolin commented 6 years ago

Another consideration is that gene level annotations can contain both processed transcript children and non coding children. The gene glyph could be extended to handle this. Currently it just displays a box for the non coding children, but it could be coded to display the "segments" glyph (and maybe color it differently)

screenshot-localhost-2018 07 13-06-41-43

Some basic code for this here https://github.com/GMOD/jbrowse/tree/unprocessed_transcript_glyph

Weird corner case is that the style.color of the higher level feature is having trouble overriding the lower level box glyphs goldenrod

cmdcolin commented 6 years ago

Other types of things you might see in an NCBI GFF that make JBrowse glyphs turn out bad:

"gene" feature with 0 subfeatures

gunzip -c GRCh38_latest_genomic.gff.gz|grep PFN1P10
NC_000001.11    Curated Genomic gene    21459424        21460202        .       -       .       ID=gene632;Dbxref=GeneID:767853,HGNC:HGNC:42985;Name=PFN1P10;description=profilin 1 pseudogene 10;gbkey=Gene;gene=PFN1P10;gene_biotype=pseudogene;pseudo=true

"gene" feature with direct exon children

gunzip -c GRCh38_latest_genomic.gff.gz|grep CROCCP5
NC_000001.11    Curated Genomic gene    21434320        21436826        .       +       .       ID=gene630;Dbxref=GeneID:100421114,HGNC:HGNC:43865;Name=CROCCP5;description=ciliary rootlet coiled-coil%2C rootletin pseudogene 5;gbkey=Gene;gene=CROCCP5;gene_biotype=pseudogene;pseudo=true
NC_000001.11    Curated Genomic exon    21434320        21434540        .       +       .       ID=id26949;Parent=gene630;Dbxref=GeneID:100421114,HGNC:HGNC:43865;gbkey=exon;gene=CROCCP5
NC_000001.11    Curated Genomic exon    21435298        21435394        .       +       .       ID=id26950;Parent=gene630;Dbxref=GeneID:100421114,HGNC:HGNC:43865;gbkey=exon;gene=CROCCP5
NC_000001.11    Curated Genomic exon    21436067        21436150        .       +       .       ID=id26951;Parent=gene630;Dbxref=GeneID:100421114,HGNC:HGNC:43865;gbkey=exon;gene=CROCCP5
NC_000001.11    Curated Genomic exon    21436576        21436826        .       +       .       ID=id26952;Parent=gene630;Dbxref=GeneID:100421114,HGNC:HGNC:43865;gbkey=exon;gene=CROCCP5
cmdcolin commented 6 years ago

The issues of genes without subfeatures was actually fixed by something that @rbuels made (in release notes @rbuels mentioned "Fixed a bug in which feature labels would sometimes be repeated across the view, in the wrong locations")

The idea of the non-coding transcripts is now implemented much better by this now. The idea of receiving GFF with "pseudogene->pseudotranscript->pseudoexon" or something similar with sequence ontology correctness like this is still a little unclear but if needed we can make a new issue