GMOD / Apollo

Genome annotation editor with a Java Server backend and a Javascript client that runs in a web browser as a JBrowse plugin.
http://genomearchitect.readthedocs.io/
Other
128 stars 85 forks source link

Centromere/Pseudogene Biotypes #1922

Closed kpepper closed 6 years ago

kpepper commented 6 years ago

Hi,

I was recommended by a JBrowse developer (@cmdcolin) to post this issue here:

JBrowse Issue 1075

The issue is really around gff3 exporting via Get gff3 or save track data, from the user created annotations track, specifically for centromeres and pseudogenes (see issue link above). I would like to be able to drag a centromere, get the gff3 and see just a centromere biotype feature in it. Similarly, for Pseudogenes, I would like to see pseudogene->pseudogenic_transcript->pseudogenic_exon. Currently, the nearest I can get is pseudogene->transcript->exon.

Any ideas how I can get around this other than post-processing of output gff3?

nathandunn commented 6 years ago

@kpepper

Some questions?

  1. Do all of your pseudogenes follow the 3-level pattern or just some of them?

    pseudogene->pseudogenic_transcript->pseudogenic_exon

If so, I would just change it in a script post-processing. In Apollo transcript implies pseudogenic_transcript (as opposed to ncRNA, mRNA, etc.). There is only a single type of exon.

  1. Would it be helpful to set a default biotype for that track? If so, you could set that.

    https://genomearchitect.readthedocs.io/en/latest/Configure.html?highlight=biotype#set-the-default-biotype-for-dragging-up-evidence

You can still set to anything you want, but by default when you drag up it will be the biotype you want. For pseudogene it would be "Transcript" or "transcript".

kpepper commented 6 years ago

(1) Yes - 3 level. Okay understood. (2) So I currently set the biotype to pseudogene which is how I get the pseudogene->transcript->exon gff3 output. Is there any way of handling the centromere without post-processing? As mentioned in the JBrowse issue, when I set a biotype of "centromere" I get an error on dragging the feature to the user created annotations track, I presume because I'm using --type 'centromere' and centromere is not a defined type? But what else could I use for the type in this case?

nathandunn commented 6 years ago

How do you want to represent your centromere? As a single-level biotype with type 'centromere'?

If so, you would have to add it to the code-base explicitly similar to repeat_region and transposable_element and then you can directly annotate it. Happy to point you in the right direction on this if you want to open a PR.

FYI @deepakunni3 / @erasche I thought we might have a good working example on how this was done. I seem to remember a really clean PR demonstrating this (maybe on a fork made by @erasche ?), but am unsure.

kpepper commented 6 years ago

Yes - single level centromere biotype. It's probably not essential for us at the moment, but thanks for the info just in case we do need it later.