GMOD / jbrowse

JBrowse 1, a full-featured genome browser built with JavaScript and HTML5. For JBrowse 2, see https://github.com/GMOD/jbrowse-components.
http://jbrowse.org
Other
464 stars 199 forks source link

Pseudogenes #1075

Closed kpepper closed 6 years ago

kpepper commented 6 years ago

Hi,

I have pseudogene, pseudogenic_exon and pseudogenic_transcript features in some gff files - any way to represent these on one track in JBrowse/Apollo, using flatfile-to-json.pl?

Thanks.

cmdcolin commented 6 years ago

For HTMLFeatures, which is generally used in Apollo more, you can use something like this. Note that HTMLFeatures does't capture the 3 layer hierarchy well though, so you can "load at the pseudo transcript level" (this command is analogous to flatfile-to-json commands for HTMLFeatures that use --type mRNA, so this switches that with --type pseudotranscript)

bin/flatfile-to-json.pl --type pseudotranscript --trackLabel pseudogenes --gff pseudogenes.gff --className feature2

For CanvasFeatures, if you only want to capture the two level hierarchy, you can just run

bin/flatfile-to-json.pl --type pseudotranscript --trackLabel pseudogenes --gff pseudogenes.gff --trackType CanvasFeatures

If you want to capture the 3 level pseudogene hierarchy, it might be worth adding some custom code, so let us know if that is desirable :) I am thinking that probably adding a new Pseudogene glyph or something similar would be good.

Here's how both of these look with a little sample pseudogene gff

screenshot-localhost-2018 06 25-14-29-44

cmdcolin commented 6 years ago

For the three level hierarchy without the custom Pseudogene glyph you can make a config that manually specifies "glyph", "transcriptType", and "subParts" (and custom color to stand out)

  {
     "label" : "pseudo",
     "storeClass" : "JBrowse/Store/SeqFeature/NCList",
     "glyph" : "JBrowse/View/FeatureGlyph/Gene",
     "subParts" : "pseudoexon",
     "transcriptType" : "pseudotranscript",
     "type" : "CanvasFeatures",
     "urlTemplate" : "tracks/pseudo/{refseq}/trackData.json"
     "style": { "color": "red" }
  },
kpepper commented 6 years ago

@cmdcolin Thanks a lot for the detailed response. I'll let you know how I get on.

rbuels commented 6 years ago

There is also the topLevelFeatures configuration option, which can let you filter or "hoist" subfeatures to be top-level for the purposes of display.

kpepper commented 6 years ago

@rbuels So you would specify pseudogene, pseudogenic_transcript and pseudogenic_exon as a JSON array for the topLevelFeatures key if you wanted them all on one track?

cmdcolin commented 6 years ago

You could use topLevelFeatures to set

topLevelFeatures=pseudogenic_transcript

The topLevelFeatures allows you to load a "3 level hierarchy" for example, but then only display "2 levels of it" by setting what the top level is.

cmdcolin commented 6 years ago

I think that this issue is solved?

If it's ok I'll close this for now, but if you have questions feel free to post back here or reopen

I created a new issue for creating a CanvasFeatures pseudogene glyph https://github.com/GMOD/jbrowse/issues/1106

kpepper commented 6 years ago

If I export a pseudogene glyph feature to gff3 from JBrowse I then get a Pseudogene/transcript/exon structure in the gff. I would like to get a pseudogene/pseudogenic_transcript/pseudogenic_exon structure - is that possible or would I need to do some script manipulation afterwards?

And kind of related, what is the best JBrowse feature type to represent a centromere gff feature type? Is there any way of ensuring that when you export it to gff3 again it gets exported as "centromere" rather than whatever JBrowse type is used to represent it?

cmdcolin commented 6 years ago

@kpepper With the centromere, you can easily just set that as the biotype column in gff and load it standard

bin/flatfile-to-json.pl --gff out.gff --trackLabel centromeres

Then it is simple to see the features and export them with "Save track data"

screenshot-localhost-2018 07 16-15-09-36

The use case for pseudogene, pseudogenic_transcript, pseudogenic_exon I feel like it is the same idea, it works fine and exports fine via the "Save track data" as far as I can tell too?

sample gff

ctgA    .       pseudogene      2000    3000    166     5       .       ID=ps;
ctgA    .       pseudogenic_transcript        2000    3000    166     6       .       ID=pt;Parent=ps;
ctgA    .       pseudogenic_exon      2000    2100    166     8       .       Parent=pt;
ctgA    .       pseudogenic_exon      2500    2600    166     8       .       Parent=pt;
ctgA    .       pseudogenic_exon      2800    3000    166     8       .       Parent=pt;

screenshot-localhost-2018 07 16-15-20-18

cmdcolin commented 6 years ago

Let me know if that all makes sense, when you said export I just figured you meant save track data

kpepper commented 6 years ago

Hi @cmdcolin Thanks for the reply. So yes, export = Get GFF3 or Save track data. I was loading centromere as you suggested, but if I then drag a centromere to the annotation track and save it, it's saved as gene.mRNA.cds.exon rather than centromere biotype. The type on the track shows as mRNA. If I change the default-biotype to "centromere" I then get an error when I drag the feature to the annotation track. I'm using: Apollo 2.1.0 and JBrowse 15dfd2309f2d508d8bed782d0f68b38dd9927bb4.

cmdcolin commented 6 years ago

The user created annotations are apollo related. I know that it has been a little tricky to annotate non-gene/mrna/cds/exon in my experience but it might be possible with some configuration. Anyways, probably apollo github is the best place!

Edit: That includes export/get gff on the user created annotations, that goes to their custom code

kpepper commented 6 years ago

Okay, thanks for the help, much appreciated.