GMOD / Apollo

Genome annotation editor with a Java Server backend and a Javascript client that runs in a web browser as a JBrowse plugin.
http://genomearchitect.readthedocs.io/
Other
128 stars 85 forks source link

Provide submenu when merging two different feature types. #23

Closed monicacecilia closed 9 years ago

monicacecilia commented 10 years ago

In an instance using the November Web Apollo release (at NAL/USDA), several pseudogene annotations have an mRNA child feature. This is not biologically sound - and frankly should not be possible (check SO).

Monica Poelchau found this out after the annotator conducted the modifications. Neither her nor I have not been able to reproduce it.

screen shot 2014-09-11 at 1 00 56 am

childers commented 10 years ago

I initially thought it was caused by a merge event based on how it was a mashup of two feature types that SO dictates should never go together.
After looking more deeply at the transaction history, I now think that this looks more like creation issue, and not a merge issue. I’ve written up a longer description below.

We looked at it from several different directions, and were never able to recreate the pseudogene/mRNA features through various merge or creation events.

We dumped the history of transactions for that scaffold so that we could more explicitly see what was happening. Below is a segment of the history, showing the creation of one of these pseudogene/mRNA features. From the records, this is the only event that includes this features ID.

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Unique id:93A2E6B57E5A8FFF1F40E42C0E77DC7C
    Transaction 0
        getOldFeatures(): []
        getNewFeatures(): [93A2E6B57E5A8FFF1F40E42C0E77DC7C (sequence:mRNA) [86823, 92472, 1, Scaffold227]]
        getAttributes(): {}
        getOperation(): ADD_TRANSCRIPT
        getFeatureUniqueName(): 93A2E6B57E5A8FFF1F40E42C0E77DC7C
        getEditor(): Editor_name_goes_here
        getDate(): Mon Aug 18 15:25:04 EDT 2014
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

From looking at these files, it looks like ADD_TRANSCRIPT is the operation for adding an mRNA feature, while ADD_FEATURE is the operation for adding a transcript type feature.

Now I think it is much more likely to be the result of some strange creation behavior. Is there a scenario where you could call ADD_TRANSCRIPT when you meant to call ADD_FEATURE? Or is there some mix of factors that could cause the creation types to be off?

Below are two much longer transaction sets I wanted to include for additional context. The first is another pseudogene/mRNA feature and the second is a pseudogene/transcript, for comparison.

>>>>>>>>>><><><><><><><pseudogene incorrectly created with mRNA><><><><><>>>>>>>>>>>
Unique id:4424476C24D941ABE3864A949EE6FD5C
    Transaction 0
        getOldFeatures(): []
        getNewFeatures(): [4424476C24D941ABE3864A949EE6FD5C (sequence:mRNA) [106411, 107146, 1, Scaffold227]]
        getAttributes(): {}
        getOperation(): ADD_TRANSCRIPT
        getFeatureUniqueName(): 4424476C24D941ABE3864A949EE6FD5C
        getEditor(): Editor_name_goes_here
        getDate(): Mon Aug 18 15:59:18 EDT 2014
    Transaction 1
        getOldFeatures(): [4424476C24D941ABE3864A949EE6FD5C (sequence:mRNA) [106411, 107146, 1, Scaffold227], 91595F2E1E27C2D19B4B715F17AAF0A7 (sequence:mRNA) [113857, 114127, 1, Scaffold227]]
        getNewFeatures(): [4424476C24D941ABE3864A949EE6FD5C (sequence:mRNA) [106411, 114127, 1, Scaffold227]]
        getAttributes(): {}
        getOperation(): MERGE_TRANSCRIPTS
        getFeatureUniqueName(): 4424476C24D941ABE3864A949EE6FD5C
        getEditor(): Editor_name_goes_here
        getDate(): Mon Aug 18 15:59:34 EDT 2014
    Transaction 2
        getOldFeatures(): [4424476C24D941ABE3864A949EE6FD5C (sequence:mRNA) [106411, 114127, 1, Scaffold227]]
        getNewFeatures(): [4424476C24D941ABE3864A949EE6FD5C (sequence:mRNA) [106249, 114127, 1, Scaffold227]]
        getAttributes(): {}
        getOperation(): SET_EXON_BOUNDARIES
        getFeatureUniqueName(): 4424476C24D941ABE3864A949EE6FD5C
        getEditor(): Editor_name_goes_here
        getDate(): Mon Aug 18 16:00:15 EDT 2014
    Transaction 3
        getOldFeatures(): [4424476C24D941ABE3864A949EE6FD5C (sequence:mRNA) [106249, 114127, 1, Scaffold227]]
        getNewFeatures(): [4424476C24D941ABE3864A949EE6FD5C (sequence:mRNA) [106409, 114127, 1, Scaffold227]]
        getAttributes(): {}
        getOperation(): SET_EXON_BOUNDARIES
        getFeatureUniqueName(): 4424476C24D941ABE3864A949EE6FD5C
        getEditor(): Editor_name_goes_here
        getDate(): Mon Aug 18 16:00:18 EDT 2014
    Transaction 4
        getOldFeatures(): [4424476C24D941ABE3864A949EE6FD5C (sequence:mRNA) [106409, 114127, 1, Scaffold227]]
        getNewFeatures(): [4424476C24D941ABE3864A949EE6FD5C (sequence:mRNA) [106411, 114127, 1, Scaffold227]]
        getAttributes(): {}
        getOperation(): SET_EXON_BOUNDARIES
        getFeatureUniqueName(): 4424476C24D941ABE3864A949EE6FD5C
        getEditor(): Editor_name_goes_here
        getDate(): Mon Aug 18 16:00:24 EDT 2014
    Transaction 5
        getOldFeatures(): [4424476C24D941ABE3864A949EE6FD5C (sequence:mRNA) [106411, 114127, 1, Scaffold227]]
        getNewFeatures(): [4424476C24D941ABE3864A949EE6FD5C (sequence:mRNA) [106411, 114127, 1, Scaffold227]]
        getAttributes(): {}
        getOperation(): SET_EXON_BOUNDARIES
        getFeatureUniqueName(): 4424476C24D941ABE3864A949EE6FD5C
        getEditor(): Editor_name_goes_here
        getDate(): Mon Aug 18 16:00:43 EDT 2014
    Transaction 6
        getOldFeatures(): [4424476C24D941ABE3864A949EE6FD5C (sequence:mRNA) [106411, 114127, 1, Scaffold227]]
        getNewFeatures(): [4424476C24D941ABE3864A949EE6FD5C (sequence:mRNA) [106411, 114127, 1, Scaffold227]]
        getAttributes(): {}
        getOperation(): SET_EXON_BOUNDARIES
        getFeatureUniqueName(): 4424476C24D941ABE3864A949EE6FD5C
        getEditor(): Editor_name_goes_here
        getDate(): Mon Aug 18 16:01:04 EDT 2014

>>>>>>>>>><><><><><><><pseudogene correctly created with transcript><><><><><>>>>>>>>>>>
Unique id:5EBD8C3B82C9EB94530AC722881C8E80
    Transaction 0
        getOldFeatures(): []
        getNewFeatures(): [5EBD8C3B82C9EB94530AC722881C8E80 (sequence:transcript) [110759, 111170, 1, Scaffold227]]
        getAttributes(): {}
        getOperation(): ADD_FEATURE
        getFeatureUniqueName(): 5EBD8C3B82C9EB94530AC722881C8E80
        getEditor(): Editor_name_goes_here
        getDate(): Mon Aug 18 16:02:17 EDT 2014
    Transaction 1
        getOldFeatures(): [5EBD8C3B82C9EB94530AC722881C8E80 (sequence:transcript) [110759, 111170, 1, Scaffold227]]
        getNewFeatures(): [5EBD8C3B82C9EB94530AC722881C8E80 (sequence:transcript) [110759, 111814, 1, Scaffold227]]
        getAttributes(): {}
        getOperation(): SET_EXON_BOUNDARIES
        getFeatureUniqueName(): 5EBD8C3B82C9EB94530AC722881C8E80
        getEditor(): Editor_name_goes_here
        getDate(): Mon Aug 18 16:02:32 EDT 2014
    Transaction 2
        getOldFeatures(): [5EBD8C3B82C9EB94530AC722881C8E80 (sequence:transcript) [110759, 111814, 1, Scaffold227]]
        getNewFeatures(): [5EBD8C3B82C9EB94530AC722881C8E80 (sequence:transcript) [110759, 111626, 1, Scaffold227]]
        getAttributes(): {}
        getOperation(): SET_EXON_BOUNDARIES
        getFeatureUniqueName(): 5EBD8C3B82C9EB94530AC722881C8E80
        getEditor(): Editor_name_goes_here
        getDate(): Mon Aug 18 16:02:51 EDT 2014
    Transaction 3
        getOldFeatures(): [5EBD8C3B82C9EB94530AC722881C8E80 (sequence:transcript) [110759, 111626, 1, Scaffold227]]
        getNewFeatures(): [5EBD8C3B82C9EB94530AC722881C8E80 (sequence:transcript) [110759, 111605, 1, Scaffold227]]
        getAttributes(): {}
        getOperation(): SET_EXON_BOUNDARIES
        getFeatureUniqueName(): 5EBD8C3B82C9EB94530AC722881C8E80
        getEditor(): Editor_name_goes_here
        getDate(): Mon Aug 18 16:03:07 EDT 2014
    Transaction 4
        getOldFeatures(): [5EBD8C3B82C9EB94530AC722881C8E80 (sequence:transcript) [110759, 111605, 1, Scaffold227], B4FA93461AFB4CB4BB7A0575BC631B43 (sequence:mRNA) [113857, 114127, 1, Scaffold227]]
        getNewFeatures(): [5EBD8C3B82C9EB94530AC722881C8E80 (sequence:transcript) [110759, 114127, 1, Scaffold227]]
        getAttributes(): {}
        getOperation(): MERGE_TRANSCRIPTS
        getFeatureUniqueName(): 5EBD8C3B82C9EB94530AC722881C8E80
        getEditor(): Editor_name_goes_here
        getDate(): Mon Aug 18 16:03:20 EDT 2014
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
nathandunn commented 10 years ago

Hi all,

Thanks for reminding me about this issue, I’ve been meaning to write up what I’ve found, and will cross-post this to the issue tracker. Forgive the long letter, I’ve been working on this for a while now.

I initially thought it was caused by a merge event based on how it was a mashup of two feature types that SO dictates should never go together.

After looking more deeply at the transaction history, this looks more like creation issue, not a merge issue. I’ve written up a longer description below.

We looked at it from several different directions, and were never able to recreate the final result. Finally we dumped the history of transactions for that scaffold so that we could more explicitly see what was happening. Below is a segment of the history, showing the creation of one of these pseudogene/mRNA features. From the records, this is the only event that includes this features ID.

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Unique id:93A2E6B57E5A8FFF1F40E42C0E77DC7C
    Transaction 0
        getOldFeatures(): []
        getNewFeatures(): [93A2E6B57E5A8FFF1F40E42C0E77DC7C (sequence:mRNA) [86823, 92472, 1, Scaffold227]]
        getAttributes(): {}
        getOperation(): ADD_TRANSCRIPT
        getFeatureUniqueName(): 93A2E6B57E5A8FFF1F40E42C0E77DC7C
        getEditor(): Editor_name_goes_here
        getDate(): Mon Aug 18 15:25:04 EDT 2014
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

From looking at these files, it looks like ADD_TRANSCRIPT is the operation for adding an mRNA feature, while ADD_FEATURE is the operation for adding a transcript type feature. Nathan, if you’ve had a chance to dig into this bit of code, I’d love to chat more about this, though all I have right now are questions.

Now I think it is much more likely to be the result of some strange creation behavior. Is there a scenario where you could call ADD_TRANSCRIPT when you meant to call ADD_FEATURE? Or is there some mix of factors that could cause the creation types to be off?

Below are two much longer transaction sets I wanted to include for additional context. The first is another pseudogene/mRNA feature and the second is a pseudogene/transcript, for comparison.

>>>>>>>>>><><><><><><><pseudogene incorrectly created with mRNA><><><><><>>>>>>>>>>>
Unique id:4424476C24D941ABE3864A949EE6FD5C
    Transaction 0
        getOldFeatures(): []
        getNewFeatures(): [4424476C24D941ABE3864A949EE6FD5C (sequence:mRNA) [106411, 107146, 1, Scaffold227]]
        getAttributes(): {}
        getOperation(): ADD_TRANSCRIPT
        getFeatureUniqueName(): 4424476C24D941ABE3864A949EE6FD5C
        getEditor(): Editor_name_goes_here
        getDate(): Mon Aug 18 15:59:18 EDT 2014
    Transaction 1
        getOldFeatures(): [4424476C24D941ABE3864A949EE6FD5C (sequence:mRNA) [106411, 107146, 1, Scaffold227], 91595F2E1E27C2D19B4B715F17AAF0A7 (sequence:mRNA) [113857, 114127, 1, Scaffold227]]
        getNewFeatures(): [4424476C24D941ABE3864A949EE6FD5C (sequence:mRNA) [106411, 114127, 1, Scaffold227]]
        getAttributes(): {}
        getOperation(): MERGE_TRANSCRIPTS
        getFeatureUniqueName(): 4424476C24D941ABE3864A949EE6FD5C
        getEditor(): Editor_name_goes_here
        getDate(): Mon Aug 18 15:59:34 EDT 2014
    Transaction 2
        getOldFeatures(): [4424476C24D941ABE3864A949EE6FD5C (sequence:mRNA) [106411, 114127, 1, Scaffold227]]
        getNewFeatures(): [4424476C24D941ABE3864A949EE6FD5C (sequence:mRNA) [106249, 114127, 1, Scaffold227]]
        getAttributes(): {}
        getOperation(): SET_EXON_BOUNDARIES
        getFeatureUniqueName(): 4424476C24D941ABE3864A949EE6FD5C
        getEditor(): Editor_name_goes_here
        getDate(): Mon Aug 18 16:00:15 EDT 2014
    Transaction 3
        getOldFeatures(): [4424476C24D941ABE3864A949EE6FD5C (sequence:mRNA) [106249, 114127, 1, Scaffold227]]
        getNewFeatures(): [4424476C24D941ABE3864A949EE6FD5C (sequence:mRNA) [106409, 114127, 1, Scaffold227]]
        getAttributes(): {}
        getOperation(): SET_EXON_BOUNDARIES
        getFeatureUniqueName(): 4424476C24D941ABE3864A949EE6FD5C
        getEditor(): Editor_name_goes_here
        getDate(): Mon Aug 18 16:00:18 EDT 2014
    Transaction 4
        getOldFeatures(): [4424476C24D941ABE3864A949EE6FD5C (sequence:mRNA) [106409, 114127, 1, Scaffold227]]
        getNewFeatures(): [4424476C24D941ABE3864A949EE6FD5C (sequence:mRNA) [106411, 114127, 1, Scaffold227]]
        getAttributes(): {}
        getOperation(): SET_EXON_BOUNDARIES
        getFeatureUniqueName(): 4424476C24D941ABE3864A949EE6FD5C
        getEditor(): Editor_name_goes_here
        getDate(): Mon Aug 18 16:00:24 EDT 2014
    Transaction 5
        getOldFeatures(): [4424476C24D941ABE3864A949EE6FD5C (sequence:mRNA) [106411, 114127, 1, Scaffold227]]
        getNewFeatures(): [4424476C24D941ABE3864A949EE6FD5C (sequence:mRNA) [106411, 114127, 1, Scaffold227]]
        getAttributes(): {}
        getOperation(): SET_EXON_BOUNDARIES
        getFeatureUniqueName(): 4424476C24D941ABE3864A949EE6FD5C
        getEditor(): Editor_name_goes_here
        getDate(): Mon Aug 18 16:00:43 EDT 2014
    Transaction 6
        getOldFeatures(): [4424476C24D941ABE3864A949EE6FD5C (sequence:mRNA) [106411, 114127, 1, Scaffold227]]
        getAttributes(): {}
        getOperation(): SET_EXON_BOUNDARIES
        getFeatureUniqueName(): 4424476C24D941ABE3864A949EE6FD5C
        getEditor(): Editor_name_goes_here
        getDate(): Mon Aug 18 16:01:04 EDT 2014

>>>>>>>>>><><><><><><><pseudogene correctly created with transcript><><><><><>>>>>>>>>>>
Unique id:5EBD8C3B82C9EB94530AC722881C8E80
    Transaction 0
        getOldFeatures(): []
        getNewFeatures(): [5EBD8C3B82C9EB94530AC722881C8E80 (sequence:transcript) [110759, 111170, 1, Scaffold227]]
        getAttributes(): {}
        getOperation(): ADD_FEATURE
        getFeatureUniqueName(): 5EBD8C3B82C9EB94530AC722881C8E80
        getEditor(): Editor_name_goes_here
        getDate(): Mon Aug 18 16:02:17 EDT 2014
    Transaction 1
        getOldFeatures(): [5EBD8C3B82C9EB94530AC722881C8E80 (sequence:transcript) [110759, 111170, 1, Scaffold227]]
        getNewFeatures(): [5EBD8C3B82C9EB94530AC722881C8E80 (sequence:transcript) [110759, 111814, 1, Scaffold227]]
        getAttributes(): {}
        getOperation(): SET_EXON_BOUNDARIES
        getFeatureUniqueName(): 5EBD8C3B82C9EB94530AC722881C8E80
        getEditor(): Editor_name_goes_here
        getDate(): Mon Aug 18 16:02:32 EDT 2014
    Transaction 2
        getOldFeatures(): [5EBD8C3B82C9EB94530AC722881C8E80 (sequence:transcript) [110759, 111814, 1, Scaffold227]]
        getNewFeatures(): [5EBD8C3B82C9EB94530AC722881C8E80 (sequence:transcript) [110759, 111626, 1, Scaffold227]]
        getAttributes(): {}
        getOperation(): SET_EXON_BOUNDARIES
        getFeatureUniqueName(): 5EBD8C3B82C9EB94530AC722881C8E80
        getEditor(): Editor_name_goes_here
        getDate(): Mon Aug 18 16:02:51 EDT 2014
    Transaction 3
        getOldFeatures(): [5EBD8C3B82C9EB94530AC722881C8E80 (sequence:transcript) [110759, 111626, 1, Scaffold227]]
        getNewFeatures(): [5EBD8C3B82C9EB94530AC722881C8E80 (sequence:transcript) [110759, 111605, 1, Scaffold227]]
        getAttributes(): {}
        getOperation(): SET_EXON_BOUNDARIES
        getFeatureUniqueName(): 5EBD8C3B82C9EB94530AC722881C8E80
        getEditor(): Editor_name_goes_here
        getDate(): Mon Aug 18 16:03:07 EDT 2014
    Transaction 4
        getOldFeatures(): [5EBD8C3B82C9EB94530AC722881C8E80 (sequence:transcript) [110759, 111605, 1, Scaffold227], B4FA93461AFB4CB4BB7A0575BC631B43 (sequence:mRNA) [113857, 114127, 1, Scaffold227]]
        getNewFeatures(): [5EBD8C3B82C9EB94530AC722881C8E80 (sequence:transcript) [110759, 114127, 1, Scaffold227]]
        getAttributes(): {}
        getOperation(): MERGE_TRANSCRIPTS
        getFeatureUniqueName(): 5EBD8C3B82C9EB94530AC722881C8E80
        getEditor(): Editor_name_goes_here
        getDate(): Mon Aug 18 16:03:20 EDT 2014
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

Edited by colin

nathandunn commented 10 years ago

One annotations is a pseudogene, but has an mRNA child feature. I didn't think that was possible in Web Apollo, and I'm not sure how to reproduce it - is this a thing?

. . . no, it should not. pseudogenes should only have "transcript" as children. and the error is everywhere, too. 103b, 104a, 104c, ... many in that region.

childers commented 10 years ago

I also followed up with the original annotator, and received confirmation that these are all supposed to be pseudogenes. Hopefully knowing what the end product was supposed to be will help reduce the possibilities for how these features were created.

cmdcolin commented 10 years ago

I found a scenario with the add_transcripts_from_gff3_to_annotations.pl bulkloader that can create a pseudogene with mRNA subfeature, but you have to be pretty specific to achieve this output

Example GFF:

Group1.1    amel_OGSv3.2    pseudogene  507599  515039  1   -   .   ID=GB42155;Note="Testing"
Group1.1    amel_OGSv3.2    mRNA    507599  515039  1   -   .   ID=GB42155-RA;Parent=GB42155;Note=Testing
Group1.1    amel_OGSv3.2    exon    507599  509541  1   -   .   Parent=GB42155-RA
Group1.1    amel_OGSv3.2    exon    512910  513906  1   -   .   Parent=GB42155-RA
Group1.1    amel_OGSv3.2    exon    514009  514408  1   -   .   Parent=GB42155-RA
Group1.1    amel_OGSv3.2    exon    514761  515039  1   -   .   Parent=GB42155-RA

Example command line:

./add_transcripts_from_gff3_to_annotations.pl -u ***** -p ***** -U http://localhost:8080/WebApollo -g pseudogene -G pseudogene -i amel_pseudo.gff

This shows that you can create a pseudogene with mRNA children, but it doesn't necessarily explain the scenario in the case that they were using a drag and drop.

nathandunn commented 10 years ago

I got this in the interface . . . I merged a gene and a pseudogene and ended with a pseudogene that has a transcript and an mRNA.

The back-end does not prevent this, it only takes what the UI has. Question, what do you expect to see?

screen shot 2014-09-29 at 1 40 09 pm

nathandunn commented 10 years ago

http://icebox.lbl.gov/WebApolloDemoStaging/jbrowse/?loc=Group1.10%3A102002..104756&tracks=DNA%2CAnnotations%2CAmel_4.5_NCBI_EST.gff%2CNCBI%20RefSeq%20Noncoding%20RNA&highlight=

The 2nd and 4th ones.

nathandunn commented 10 years ago

I had built 3 genes, 2 pseudogenes, and another gene and then merged one of the genes with one of the pseudogenes. I only did one merge.

childers commented 10 years ago

Should Web Apollo even allow merging of different types of features? Merging like features is very powerful, but merging not alike features can cause a lot of post-merge issues. For example, merging coding and non-coding features to make a non-coding feature would require the removal of CDS children.

nathandunn commented 10 years ago

I agree, just not allowing it at the off-set makes sense. If I could get a matrix of allowed and disallowed merges, then I could just implement that directly.

monicacecilia commented 9 years ago

Well, not so fast.

This means we will not allow curators to EVER bring together a gene with a pseudogene?

A pseudogene annotation that has an mRNA child feature is not biologically sound, but that's our error, not the curator's. What happens when you start with a gene model as the starting hypothesis and then realize it is part of a pseudogene, either because you had already been working on one, or because you found biological evidence in support of it after you started the annotation? -- which, btw is what H. Robertson had been doing.

I disagree with Chris and Nathan, and I think we should allow merging of the two; THE CAVEAT is that if I am merging a gene to a pseudogene, the expected behaviour should always be that the entire model becomes a pseudogene and NO mRNA child is produced.

I know we currently do not allow curators to change their mind, but frankly, we should.

If the decision is made that this merge is not allowed (harrumph!), then we will have to be very clear with users that they are (still) not allowed to change their minds, and that if they need to bring a gene model as part of a pseudogene, they have to start from the beginning using with a pseudogene annotation instead. ... as in, I would need to add something about this on the user guide.

childers commented 9 years ago

I really like the idea of changing an annotation type during, or at the end of the curation, and I see that as being different from merging different types of features. Can we add that as a separate feature request?

Mixing different types of things, and predefining the result of the merge will be really complex. Web Apollo currently supports ten different types of features, and any of them could potentially be merged, potentially in larger combinations.

There could be a dialog box asking if the annotator knows they are mixing two different types of feature, then asking what the resulting feature should be (similar to the change feature type dialog), but I think it would be a lot easier to change the feature types to the final feature type first, then merge as normal.

On Tue, Sep 30, 2014 at 8:31 AM, Monica Munoz-Torres < notifications@github.com> wrote:

Well, not so fast.

This means we will not allow curators to EVER bring together a gene with a pseudogene?

A pseudogene annotation that has an mRNA child feature is not biologically sound, but that's our error, not the curator's. What happens when you start with a gene model as the starting hypothesis and then realize it is part of a pseudogene, either because you had already been working on one, or because you found biological evidence in support of it after you started the annotation? -- which, btw is what H. Robertson had been doing.

I disagree with Chris and Nathan, and I think we should allow merging of the two; THE CAVEAT is that if I am merging a gene to a pseudogene, the expected behaviour should always be that the entire model becomes a pseudogene and NO mRNA child is produced.

I know we currently do not allow curators to change their mind, but frankly, we should.

If the decision is made that this merge is not allowed (harrumph!), then we will have to be very clear with users that they are (still) not allowed to change their minds, and that if they need to bring a gene model as part of a pseudogene, they have to start from the beginning using with a pseudogene annotation instead. ... as in, I would need to add something about this on the user guide.

— Reply to this email directly or view it on GitHub https://github.com/GMOD/Apollo/issues/23#issuecomment-57306010.

nathandunn commented 9 years ago

So, the flexibility is good. I am seeing two options making themselves apparent:

1 - Allow merges with the behavior Moni outlined below. 2 - Allow changing of feature types (and thus the down-stream sub-types?) where only like features can be merged.

I think #2 might be the easiest and most intuitive for all feature sets. It allows much more flexibility and simplifies the merging rules. I’m not sure how changing the feature type will effect its sub-types, or anything else related to the feature.

Thoughts?

Nathan

On Sep 30, 2014, at 6:24 AM, childers notifications@github.com wrote:

I really like the idea of changing an annotation type during, or at the end of the curation, and I see that as being different from merging different types of features. Can we add that as a separate feature request?

Mixing different types of things, and predefining the result of the merge will be really complex. Web Apollo currently supports ten different types of features, and any of them could potentially be merged, potentially in larger combinations.

There could be a dialog box asking if the annotator knows they are mixing two different types of feature, then asking what the resulting feature should be (similar to the change feature type dialog), but I think it would be a lot easier to change the feature types to the final feature type first, then merge as normal.

On Tue, Sep 30, 2014 at 8:31 AM, Monica Munoz-Torres < notifications@github.com> wrote:

Well, not so fast.

This means we will not allow curators to EVER bring together a gene with a pseudogene?

A pseudogene annotation that has an mRNA child feature is not biologically sound, but that's our error, not the curator's. What happens when you start with a gene model as the starting hypothesis and then realize it is part of a pseudogene, either because you had already been working on one, or because you found biological evidence in support of it after you started the annotation? -- which, btw is what H. Robertson had been doing.

I disagree with Chris and Nathan, and I think we should allow merging of the two; THE CAVEAT is that if I am merging a gene to a pseudogene, the expected behaviour should always be that the entire model becomes a pseudogene and NO mRNA child is produced.

I know we currently do not allow curators to change their mind, but frankly, we should.

If the decision is made that this merge is not allowed (harrumph!), then we will have to be very clear with users that they are (still) not allowed to change their minds, and that if they need to bring a gene model as part of a pseudogene, they have to start from the beginning using with a pseudogene annotation instead. ... as in, I would need to add something about this on the user guide.

— Reply to this email directly or view it on GitHub https://github.com/GMOD/Apollo/issues/23#issuecomment-57306010.

— Reply to this email directly or view it on GitHub.

monicacecilia commented 9 years ago

le sigh. fiiiiiiine, then only allow merges of the same type of features.

Allowing the change of feature types should be an option. And yes, we would have to investigate what happens to the sub-types -- I am assuming they would change to correspond.

~m.

nathandunn commented 9 years ago

Okay, this is whatI have so far (had not really explored all of the possibilities before):
Gene -> {A} PseudoGene -> Transcript transposable_element repeat_region

A = mRNA (default), *RNA (t,sn,sno,nc,mi, r)

It looks like you can merge any of the transcripts together and it chooses one of the two types. transposable_element and repeat_region work as you would expect (no merges thank goodness). If I merge a gene into a transcript, that transcript is absolved / deleted as far as I can tell.

So . . . a proposal: 1- You can change Gene to Pseudogene and its sub-type will change to Transcript.
2 - If you change Pseudogene to Gene, its sub-type will change to an mRNA by default.
3 - However, you can change the type to any of the RNA's if a gene.
4 - You can only merge genes with other genes and pseudogenes with other pseudogenes. To merge a gene with a pseudogene see 1, above. 5 - You can merge a
RNA with any other RNA, but first you have to change the type to match.
6 - How would you handle merges between a
RNA and a gene/mRNA?

I feel like something is missing.

selewis commented 9 years ago

This should not be allowed.

Sent from my iPhone

On Sep 29, 2014, at 14:19, childers notifications@github.com wrote:

Should Web Apollo even allow merging of different types of features? Merging like features is very powerful, but merging not alike features can cause a lot of post-merge issues. For example, merging coding and non-coding features to make a non-coding feature would require the removal of CDS children.

— Reply to this email directly or view it on GitHub.

selewis commented 9 years ago

Moni has it right. Retract my last message

Sent from my iPhone

On Sep 30, 2014, at 05:31, Monica Munoz-Torres notifications@github.com wrote:

Well, not so fast.

This means we will not allow curators to EVER bring together a gene with a pseudogene?

A pseudogene annotation that has an mRNA child feature is not biologically sound, but that's our error, not the curator's. What happens when you start with a gene model as the starting hypothesis and then realize it is part of a pseudogene, either because you had already been working on one, or because you found biological evidence in support of it after you started the annotation? -- which, btw is what H. Robertson had been doing.

I disagree with Chris and Nathan, and I think we should allow merging of the two; THE CAVEAT is that if I am merging a gene to a pseudogene, the expected behaviour should always be that the entire model becomes a pseudogene and NO mRNA child is produced.

I know we currently do not allow curators to change their mind, but frankly, we should.

If the decision is made that this merge is not allowed (harrumph!), then we will have to be very clear with users that they are (still) not allowed to change their minds, and that if they need to bring a gene model as part of a pseudogene, they have to start from the beginning using with a pseudogene annotation instead. ... as in, I would need to add something about this on the user guide.

— Reply to this email directly or view it on GitHub.

selewis commented 9 years ago

This should be on Thursday's agenda. I don't think Moni nor I are getting our message across. Need audio.

Sent from my iPhone

On Sep 30, 2014, at 11:38, Nathan Dunn notifications@github.com wrote:

Okay, this is whatI have so far (had not really explored all of the possibilities before):

Gene -> {A} PseudoGene -> Transcript transposable_element repeat_region

A = mRNA (default), *RNA (t,sn,sno,nc,mi, r)

It looks like you can merge any of the transcripts together and it chooses one of the two types. transposable_element and repeat_region work as you would expect (no merges thank goodness). If I merge a gene into a transcript, that transcript is absolved / deleted as far as I can tell.

So . . . a proposal: 1- You can change Gene to Pseudogene and its sub-type will change to Transcript.

2 - If you change Pseudogene to Gene, its sub-type will change to an mRNA by default.

3 - However, you can change the type to any of the *RNA's if a gene.

4 - You can only merge genes with other genes and pseudogenes with other pseudogenes. To merge a gene with a pseudogene see 1, above. 5 - You can merge a RNA with any other RNA, but first you have to change the type to match.

6 - How would you handle merges between a *RNA and a gene/mRNA?

I feel like something is missing.

— Reply to this email directly or view it on GitHub.

nathandunn commented 9 years ago

I think that makes sense. Let’s talk about it on Thursday. I think there are a number of nuances with every solution that need to be discussed.

Nathan

On Sep 30, 2014, at 12:49 PM, selewis notifications@github.com wrote:

This should be on Thursday's agenda. I don't think Moni nor I are getting our message across. Need audio.

Sent from my iPhone

On Sep 30, 2014, at 11:38, Nathan Dunn notifications@github.com wrote:

Okay, this is whatI have so far (had not really explored all of the possibilities before):

Gene -> {A} PseudoGene -> Transcript transposable_element repeat_region

A = mRNA (default), *RNA (t,sn,sno,nc,mi, r)

It looks like you can merge any of the transcripts together and it chooses one of the two types. transposable_element and repeat_region work as you would expect (no merges thank goodness). If I merge a gene into a transcript, that transcript is absolved / deleted as far as I can tell.

So . . . a proposal: 1- You can change Gene to Pseudogene and its sub-type will change to Transcript.

2 - If you change Pseudogene to Gene, its sub-type will change to an mRNA by default.

3 - However, you can change the type to any of the *RNA's if a gene.

4 - You can only merge genes with other genes and pseudogenes with other pseudogenes. To merge a gene with a pseudogene see 1, above. 5 - You can merge a RNA with any other RNA, but first you have to change the type to match.

6 - How would you handle merges between a *RNA and a gene/mRNA?

I feel like something is missing.

— Reply to this email directly or view it on GitHub. — Reply to this email directly or view it on GitHub.

nathandunn commented 9 years ago

The procedure would be: 1 - select two features 2 - select “merge” *3 - "merge" has submenu with both features and they will have the text (2 of these): “Set Primary :: "

e.g., “Set Primary mRNA abcd-1234 :: Gene abcd” “Set Primary transcript defgh-1234 :: Pseudogene defgh”

The “primary” will be the final-merged type, subtype, name, symbol, etc, with everything else merged. Does this sound right?

monicacecilia commented 9 years ago

@nathandunn will do what he proposed on 2014-10-02 for 2.0

The ability to change the type of a feature after it has been dragged to Uc-A area should also be implemented, but will add an issue for 2.1. See #220