Closed monicacecilia closed 9 years ago
I initially thought it was caused by a merge event based on how it was a mashup of two feature types that SO dictates should never go together.
After looking more deeply at the transaction history, I now think that this looks more like creation issue, and not a merge issue. I’ve written up a longer description below.
We looked at it from several different directions, and were never able to recreate the pseudogene/mRNA features through various merge or creation events.
We dumped the history of transactions for that scaffold so that we could more explicitly see what was happening. Below is a segment of the history, showing the creation of one of these pseudogene/mRNA features. From the records, this is the only event that includes this features ID.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Unique id:93A2E6B57E5A8FFF1F40E42C0E77DC7C
Transaction 0
getOldFeatures(): []
getNewFeatures(): [93A2E6B57E5A8FFF1F40E42C0E77DC7C (sequence:mRNA) [86823, 92472, 1, Scaffold227]]
getAttributes(): {}
getOperation(): ADD_TRANSCRIPT
getFeatureUniqueName(): 93A2E6B57E5A8FFF1F40E42C0E77DC7C
getEditor(): Editor_name_goes_here
getDate(): Mon Aug 18 15:25:04 EDT 2014
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
From looking at these files, it looks like ADD_TRANSCRIPT is the operation for adding an mRNA feature, while ADD_FEATURE is the operation for adding a transcript type feature.
Now I think it is much more likely to be the result of some strange creation behavior. Is there a scenario where you could call ADD_TRANSCRIPT when you meant to call ADD_FEATURE? Or is there some mix of factors that could cause the creation types to be off?
Below are two much longer transaction sets I wanted to include for additional context. The first is another pseudogene/mRNA feature and the second is a pseudogene/transcript, for comparison.
>>>>>>>>>><><><><><><><pseudogene incorrectly created with mRNA><><><><><>>>>>>>>>>>
Unique id:4424476C24D941ABE3864A949EE6FD5C
Transaction 0
getOldFeatures(): []
getNewFeatures(): [4424476C24D941ABE3864A949EE6FD5C (sequence:mRNA) [106411, 107146, 1, Scaffold227]]
getAttributes(): {}
getOperation(): ADD_TRANSCRIPT
getFeatureUniqueName(): 4424476C24D941ABE3864A949EE6FD5C
getEditor(): Editor_name_goes_here
getDate(): Mon Aug 18 15:59:18 EDT 2014
Transaction 1
getOldFeatures(): [4424476C24D941ABE3864A949EE6FD5C (sequence:mRNA) [106411, 107146, 1, Scaffold227], 91595F2E1E27C2D19B4B715F17AAF0A7 (sequence:mRNA) [113857, 114127, 1, Scaffold227]]
getNewFeatures(): [4424476C24D941ABE3864A949EE6FD5C (sequence:mRNA) [106411, 114127, 1, Scaffold227]]
getAttributes(): {}
getOperation(): MERGE_TRANSCRIPTS
getFeatureUniqueName(): 4424476C24D941ABE3864A949EE6FD5C
getEditor(): Editor_name_goes_here
getDate(): Mon Aug 18 15:59:34 EDT 2014
Transaction 2
getOldFeatures(): [4424476C24D941ABE3864A949EE6FD5C (sequence:mRNA) [106411, 114127, 1, Scaffold227]]
getNewFeatures(): [4424476C24D941ABE3864A949EE6FD5C (sequence:mRNA) [106249, 114127, 1, Scaffold227]]
getAttributes(): {}
getOperation(): SET_EXON_BOUNDARIES
getFeatureUniqueName(): 4424476C24D941ABE3864A949EE6FD5C
getEditor(): Editor_name_goes_here
getDate(): Mon Aug 18 16:00:15 EDT 2014
Transaction 3
getOldFeatures(): [4424476C24D941ABE3864A949EE6FD5C (sequence:mRNA) [106249, 114127, 1, Scaffold227]]
getNewFeatures(): [4424476C24D941ABE3864A949EE6FD5C (sequence:mRNA) [106409, 114127, 1, Scaffold227]]
getAttributes(): {}
getOperation(): SET_EXON_BOUNDARIES
getFeatureUniqueName(): 4424476C24D941ABE3864A949EE6FD5C
getEditor(): Editor_name_goes_here
getDate(): Mon Aug 18 16:00:18 EDT 2014
Transaction 4
getOldFeatures(): [4424476C24D941ABE3864A949EE6FD5C (sequence:mRNA) [106409, 114127, 1, Scaffold227]]
getNewFeatures(): [4424476C24D941ABE3864A949EE6FD5C (sequence:mRNA) [106411, 114127, 1, Scaffold227]]
getAttributes(): {}
getOperation(): SET_EXON_BOUNDARIES
getFeatureUniqueName(): 4424476C24D941ABE3864A949EE6FD5C
getEditor(): Editor_name_goes_here
getDate(): Mon Aug 18 16:00:24 EDT 2014
Transaction 5
getOldFeatures(): [4424476C24D941ABE3864A949EE6FD5C (sequence:mRNA) [106411, 114127, 1, Scaffold227]]
getNewFeatures(): [4424476C24D941ABE3864A949EE6FD5C (sequence:mRNA) [106411, 114127, 1, Scaffold227]]
getAttributes(): {}
getOperation(): SET_EXON_BOUNDARIES
getFeatureUniqueName(): 4424476C24D941ABE3864A949EE6FD5C
getEditor(): Editor_name_goes_here
getDate(): Mon Aug 18 16:00:43 EDT 2014
Transaction 6
getOldFeatures(): [4424476C24D941ABE3864A949EE6FD5C (sequence:mRNA) [106411, 114127, 1, Scaffold227]]
getNewFeatures(): [4424476C24D941ABE3864A949EE6FD5C (sequence:mRNA) [106411, 114127, 1, Scaffold227]]
getAttributes(): {}
getOperation(): SET_EXON_BOUNDARIES
getFeatureUniqueName(): 4424476C24D941ABE3864A949EE6FD5C
getEditor(): Editor_name_goes_here
getDate(): Mon Aug 18 16:01:04 EDT 2014
>>>>>>>>>><><><><><><><pseudogene correctly created with transcript><><><><><>>>>>>>>>>>
Unique id:5EBD8C3B82C9EB94530AC722881C8E80
Transaction 0
getOldFeatures(): []
getNewFeatures(): [5EBD8C3B82C9EB94530AC722881C8E80 (sequence:transcript) [110759, 111170, 1, Scaffold227]]
getAttributes(): {}
getOperation(): ADD_FEATURE
getFeatureUniqueName(): 5EBD8C3B82C9EB94530AC722881C8E80
getEditor(): Editor_name_goes_here
getDate(): Mon Aug 18 16:02:17 EDT 2014
Transaction 1
getOldFeatures(): [5EBD8C3B82C9EB94530AC722881C8E80 (sequence:transcript) [110759, 111170, 1, Scaffold227]]
getNewFeatures(): [5EBD8C3B82C9EB94530AC722881C8E80 (sequence:transcript) [110759, 111814, 1, Scaffold227]]
getAttributes(): {}
getOperation(): SET_EXON_BOUNDARIES
getFeatureUniqueName(): 5EBD8C3B82C9EB94530AC722881C8E80
getEditor(): Editor_name_goes_here
getDate(): Mon Aug 18 16:02:32 EDT 2014
Transaction 2
getOldFeatures(): [5EBD8C3B82C9EB94530AC722881C8E80 (sequence:transcript) [110759, 111814, 1, Scaffold227]]
getNewFeatures(): [5EBD8C3B82C9EB94530AC722881C8E80 (sequence:transcript) [110759, 111626, 1, Scaffold227]]
getAttributes(): {}
getOperation(): SET_EXON_BOUNDARIES
getFeatureUniqueName(): 5EBD8C3B82C9EB94530AC722881C8E80
getEditor(): Editor_name_goes_here
getDate(): Mon Aug 18 16:02:51 EDT 2014
Transaction 3
getOldFeatures(): [5EBD8C3B82C9EB94530AC722881C8E80 (sequence:transcript) [110759, 111626, 1, Scaffold227]]
getNewFeatures(): [5EBD8C3B82C9EB94530AC722881C8E80 (sequence:transcript) [110759, 111605, 1, Scaffold227]]
getAttributes(): {}
getOperation(): SET_EXON_BOUNDARIES
getFeatureUniqueName(): 5EBD8C3B82C9EB94530AC722881C8E80
getEditor(): Editor_name_goes_here
getDate(): Mon Aug 18 16:03:07 EDT 2014
Transaction 4
getOldFeatures(): [5EBD8C3B82C9EB94530AC722881C8E80 (sequence:transcript) [110759, 111605, 1, Scaffold227], B4FA93461AFB4CB4BB7A0575BC631B43 (sequence:mRNA) [113857, 114127, 1, Scaffold227]]
getNewFeatures(): [5EBD8C3B82C9EB94530AC722881C8E80 (sequence:transcript) [110759, 114127, 1, Scaffold227]]
getAttributes(): {}
getOperation(): MERGE_TRANSCRIPTS
getFeatureUniqueName(): 5EBD8C3B82C9EB94530AC722881C8E80
getEditor(): Editor_name_goes_here
getDate(): Mon Aug 18 16:03:20 EDT 2014
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Hi all,
Thanks for reminding me about this issue, I’ve been meaning to write up what I’ve found, and will cross-post this to the issue tracker. Forgive the long letter, I’ve been working on this for a while now.
I initially thought it was caused by a merge event based on how it was a mashup of two feature types that SO dictates should never go together.
After looking more deeply at the transaction history, this looks more like creation issue, not a merge issue. I’ve written up a longer description below.
We looked at it from several different directions, and were never able to recreate the final result. Finally we dumped the history of transactions for that scaffold so that we could more explicitly see what was happening. Below is a segment of the history, showing the creation of one of these pseudogene/mRNA features. From the records, this is the only event that includes this features ID.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Unique id:93A2E6B57E5A8FFF1F40E42C0E77DC7C
Transaction 0
getOldFeatures(): []
getNewFeatures(): [93A2E6B57E5A8FFF1F40E42C0E77DC7C (sequence:mRNA) [86823, 92472, 1, Scaffold227]]
getAttributes(): {}
getOperation(): ADD_TRANSCRIPT
getFeatureUniqueName(): 93A2E6B57E5A8FFF1F40E42C0E77DC7C
getEditor(): Editor_name_goes_here
getDate(): Mon Aug 18 15:25:04 EDT 2014
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
From looking at these files, it looks like ADD_TRANSCRIPT is the operation for adding an mRNA feature, while ADD_FEATURE is the operation for adding a transcript type feature. Nathan, if you’ve had a chance to dig into this bit of code, I’d love to chat more about this, though all I have right now are questions.
Now I think it is much more likely to be the result of some strange creation behavior. Is there a scenario where you could call ADD_TRANSCRIPT when you meant to call ADD_FEATURE? Or is there some mix of factors that could cause the creation types to be off?
Below are two much longer transaction sets I wanted to include for additional context. The first is another pseudogene/mRNA feature and the second is a pseudogene/transcript, for comparison.
>>>>>>>>>><><><><><><><pseudogene incorrectly created with mRNA><><><><><>>>>>>>>>>>
Unique id:4424476C24D941ABE3864A949EE6FD5C
Transaction 0
getOldFeatures(): []
getNewFeatures(): [4424476C24D941ABE3864A949EE6FD5C (sequence:mRNA) [106411, 107146, 1, Scaffold227]]
getAttributes(): {}
getOperation(): ADD_TRANSCRIPT
getFeatureUniqueName(): 4424476C24D941ABE3864A949EE6FD5C
getEditor(): Editor_name_goes_here
getDate(): Mon Aug 18 15:59:18 EDT 2014
Transaction 1
getOldFeatures(): [4424476C24D941ABE3864A949EE6FD5C (sequence:mRNA) [106411, 107146, 1, Scaffold227], 91595F2E1E27C2D19B4B715F17AAF0A7 (sequence:mRNA) [113857, 114127, 1, Scaffold227]]
getNewFeatures(): [4424476C24D941ABE3864A949EE6FD5C (sequence:mRNA) [106411, 114127, 1, Scaffold227]]
getAttributes(): {}
getOperation(): MERGE_TRANSCRIPTS
getFeatureUniqueName(): 4424476C24D941ABE3864A949EE6FD5C
getEditor(): Editor_name_goes_here
getDate(): Mon Aug 18 15:59:34 EDT 2014
Transaction 2
getOldFeatures(): [4424476C24D941ABE3864A949EE6FD5C (sequence:mRNA) [106411, 114127, 1, Scaffold227]]
getNewFeatures(): [4424476C24D941ABE3864A949EE6FD5C (sequence:mRNA) [106249, 114127, 1, Scaffold227]]
getAttributes(): {}
getOperation(): SET_EXON_BOUNDARIES
getFeatureUniqueName(): 4424476C24D941ABE3864A949EE6FD5C
getEditor(): Editor_name_goes_here
getDate(): Mon Aug 18 16:00:15 EDT 2014
Transaction 3
getOldFeatures(): [4424476C24D941ABE3864A949EE6FD5C (sequence:mRNA) [106249, 114127, 1, Scaffold227]]
getNewFeatures(): [4424476C24D941ABE3864A949EE6FD5C (sequence:mRNA) [106409, 114127, 1, Scaffold227]]
getAttributes(): {}
getOperation(): SET_EXON_BOUNDARIES
getFeatureUniqueName(): 4424476C24D941ABE3864A949EE6FD5C
getEditor(): Editor_name_goes_here
getDate(): Mon Aug 18 16:00:18 EDT 2014
Transaction 4
getOldFeatures(): [4424476C24D941ABE3864A949EE6FD5C (sequence:mRNA) [106409, 114127, 1, Scaffold227]]
getNewFeatures(): [4424476C24D941ABE3864A949EE6FD5C (sequence:mRNA) [106411, 114127, 1, Scaffold227]]
getAttributes(): {}
getOperation(): SET_EXON_BOUNDARIES
getFeatureUniqueName(): 4424476C24D941ABE3864A949EE6FD5C
getEditor(): Editor_name_goes_here
getDate(): Mon Aug 18 16:00:24 EDT 2014
Transaction 5
getOldFeatures(): [4424476C24D941ABE3864A949EE6FD5C (sequence:mRNA) [106411, 114127, 1, Scaffold227]]
getNewFeatures(): [4424476C24D941ABE3864A949EE6FD5C (sequence:mRNA) [106411, 114127, 1, Scaffold227]]
getAttributes(): {}
getOperation(): SET_EXON_BOUNDARIES
getFeatureUniqueName(): 4424476C24D941ABE3864A949EE6FD5C
getEditor(): Editor_name_goes_here
getDate(): Mon Aug 18 16:00:43 EDT 2014
Transaction 6
getOldFeatures(): [4424476C24D941ABE3864A949EE6FD5C (sequence:mRNA) [106411, 114127, 1, Scaffold227]]
getAttributes(): {}
getOperation(): SET_EXON_BOUNDARIES
getFeatureUniqueName(): 4424476C24D941ABE3864A949EE6FD5C
getEditor(): Editor_name_goes_here
getDate(): Mon Aug 18 16:01:04 EDT 2014
>>>>>>>>>><><><><><><><pseudogene correctly created with transcript><><><><><>>>>>>>>>>>
Unique id:5EBD8C3B82C9EB94530AC722881C8E80
Transaction 0
getOldFeatures(): []
getNewFeatures(): [5EBD8C3B82C9EB94530AC722881C8E80 (sequence:transcript) [110759, 111170, 1, Scaffold227]]
getAttributes(): {}
getOperation(): ADD_FEATURE
getFeatureUniqueName(): 5EBD8C3B82C9EB94530AC722881C8E80
getEditor(): Editor_name_goes_here
getDate(): Mon Aug 18 16:02:17 EDT 2014
Transaction 1
getOldFeatures(): [5EBD8C3B82C9EB94530AC722881C8E80 (sequence:transcript) [110759, 111170, 1, Scaffold227]]
getNewFeatures(): [5EBD8C3B82C9EB94530AC722881C8E80 (sequence:transcript) [110759, 111814, 1, Scaffold227]]
getAttributes(): {}
getOperation(): SET_EXON_BOUNDARIES
getFeatureUniqueName(): 5EBD8C3B82C9EB94530AC722881C8E80
getEditor(): Editor_name_goes_here
getDate(): Mon Aug 18 16:02:32 EDT 2014
Transaction 2
getOldFeatures(): [5EBD8C3B82C9EB94530AC722881C8E80 (sequence:transcript) [110759, 111814, 1, Scaffold227]]
getNewFeatures(): [5EBD8C3B82C9EB94530AC722881C8E80 (sequence:transcript) [110759, 111626, 1, Scaffold227]]
getAttributes(): {}
getOperation(): SET_EXON_BOUNDARIES
getFeatureUniqueName(): 5EBD8C3B82C9EB94530AC722881C8E80
getEditor(): Editor_name_goes_here
getDate(): Mon Aug 18 16:02:51 EDT 2014
Transaction 3
getOldFeatures(): [5EBD8C3B82C9EB94530AC722881C8E80 (sequence:transcript) [110759, 111626, 1, Scaffold227]]
getNewFeatures(): [5EBD8C3B82C9EB94530AC722881C8E80 (sequence:transcript) [110759, 111605, 1, Scaffold227]]
getAttributes(): {}
getOperation(): SET_EXON_BOUNDARIES
getFeatureUniqueName(): 5EBD8C3B82C9EB94530AC722881C8E80
getEditor(): Editor_name_goes_here
getDate(): Mon Aug 18 16:03:07 EDT 2014
Transaction 4
getOldFeatures(): [5EBD8C3B82C9EB94530AC722881C8E80 (sequence:transcript) [110759, 111605, 1, Scaffold227], B4FA93461AFB4CB4BB7A0575BC631B43 (sequence:mRNA) [113857, 114127, 1, Scaffold227]]
getNewFeatures(): [5EBD8C3B82C9EB94530AC722881C8E80 (sequence:transcript) [110759, 114127, 1, Scaffold227]]
getAttributes(): {}
getOperation(): MERGE_TRANSCRIPTS
getFeatureUniqueName(): 5EBD8C3B82C9EB94530AC722881C8E80
getEditor(): Editor_name_goes_here
getDate(): Mon Aug 18 16:03:20 EDT 2014
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Edited by colin
One annotations is a pseudogene, but has an mRNA child feature. I didn't think that was possible in Web Apollo, and I'm not sure how to reproduce it - is this a thing?
. . . no, it should not. pseudogenes should only have "transcript" as children. and the error is everywhere, too. 103b, 104a, 104c, ... many in that region.
I also followed up with the original annotator, and received confirmation that these are all supposed to be pseudogenes. Hopefully knowing what the end product was supposed to be will help reduce the possibilities for how these features were created.
I found a scenario with the add_transcripts_from_gff3_to_annotations.pl bulkloader that can create a pseudogene with mRNA subfeature, but you have to be pretty specific to achieve this output
Example GFF:
Group1.1 amel_OGSv3.2 pseudogene 507599 515039 1 - . ID=GB42155;Note="Testing"
Group1.1 amel_OGSv3.2 mRNA 507599 515039 1 - . ID=GB42155-RA;Parent=GB42155;Note=Testing
Group1.1 amel_OGSv3.2 exon 507599 509541 1 - . Parent=GB42155-RA
Group1.1 amel_OGSv3.2 exon 512910 513906 1 - . Parent=GB42155-RA
Group1.1 amel_OGSv3.2 exon 514009 514408 1 - . Parent=GB42155-RA
Group1.1 amel_OGSv3.2 exon 514761 515039 1 - . Parent=GB42155-RA
Example command line:
./add_transcripts_from_gff3_to_annotations.pl -u ***** -p ***** -U http://localhost:8080/WebApollo -g pseudogene -G pseudogene -i amel_pseudo.gff
This shows that you can create a pseudogene with mRNA children, but it doesn't necessarily explain the scenario in the case that they were using a drag and drop.
I got this in the interface . . . I merged a gene and a pseudogene and ended with a pseudogene that has a transcript and an mRNA.
The back-end does not prevent this, it only takes what the UI has. Question, what do you expect to see?
I had built 3 genes, 2 pseudogenes, and another gene and then merged one of the genes with one of the pseudogenes. I only did one merge.
Should Web Apollo even allow merging of different types of features? Merging like features is very powerful, but merging not alike features can cause a lot of post-merge issues. For example, merging coding and non-coding features to make a non-coding feature would require the removal of CDS children.
I agree, just not allowing it at the off-set makes sense. If I could get a matrix of allowed and disallowed merges, then I could just implement that directly.
Well, not so fast.
This means we will not allow curators to EVER bring together a gene with a pseudogene?
A pseudogene annotation that has an mRNA child feature is not biologically sound, but that's our error, not the curator's. What happens when you start with a gene model as the starting hypothesis and then realize it is part of a pseudogene, either because you had already been working on one, or because you found biological evidence in support of it after you started the annotation? -- which, btw is what H. Robertson had been doing.
I disagree with Chris and Nathan, and I think we should allow merging of the two; THE CAVEAT is that if I am merging a gene to a pseudogene, the expected behaviour should always be that the entire model becomes a pseudogene and NO mRNA child is produced.
I know we currently do not allow curators to change their mind, but frankly, we should.
If the decision is made that this merge is not allowed (harrumph!), then we will have to be very clear with users that they are (still) not allowed to change their minds, and that if they need to bring a gene model as part of a pseudogene, they have to start from the beginning using with a pseudogene annotation instead. ... as in, I would need to add something about this on the user guide.
I really like the idea of changing an annotation type during, or at the end of the curation, and I see that as being different from merging different types of features. Can we add that as a separate feature request?
Mixing different types of things, and predefining the result of the merge will be really complex. Web Apollo currently supports ten different types of features, and any of them could potentially be merged, potentially in larger combinations.
There could be a dialog box asking if the annotator knows they are mixing two different types of feature, then asking what the resulting feature should be (similar to the change feature type dialog), but I think it would be a lot easier to change the feature types to the final feature type first, then merge as normal.
On Tue, Sep 30, 2014 at 8:31 AM, Monica Munoz-Torres < notifications@github.com> wrote:
Well, not so fast.
This means we will not allow curators to EVER bring together a gene with a pseudogene?
A pseudogene annotation that has an mRNA child feature is not biologically sound, but that's our error, not the curator's. What happens when you start with a gene model as the starting hypothesis and then realize it is part of a pseudogene, either because you had already been working on one, or because you found biological evidence in support of it after you started the annotation? -- which, btw is what H. Robertson had been doing.
I disagree with Chris and Nathan, and I think we should allow merging of the two; THE CAVEAT is that if I am merging a gene to a pseudogene, the expected behaviour should always be that the entire model becomes a pseudogene and NO mRNA child is produced.
I know we currently do not allow curators to change their mind, but frankly, we should.
If the decision is made that this merge is not allowed (harrumph!), then we will have to be very clear with users that they are (still) not allowed to change their minds, and that if they need to bring a gene model as part of a pseudogene, they have to start from the beginning using with a pseudogene annotation instead. ... as in, I would need to add something about this on the user guide.
— Reply to this email directly or view it on GitHub https://github.com/GMOD/Apollo/issues/23#issuecomment-57306010.
So, the flexibility is good. I am seeing two options making themselves apparent:
1 - Allow merges with the behavior Moni outlined below. 2 - Allow changing of feature types (and thus the down-stream sub-types?) where only like features can be merged.
I think #2 might be the easiest and most intuitive for all feature sets. It allows much more flexibility and simplifies the merging rules. I’m not sure how changing the feature type will effect its sub-types, or anything else related to the feature.
Thoughts?
Nathan
On Sep 30, 2014, at 6:24 AM, childers notifications@github.com wrote:
I really like the idea of changing an annotation type during, or at the end of the curation, and I see that as being different from merging different types of features. Can we add that as a separate feature request?
Mixing different types of things, and predefining the result of the merge will be really complex. Web Apollo currently supports ten different types of features, and any of them could potentially be merged, potentially in larger combinations.
There could be a dialog box asking if the annotator knows they are mixing two different types of feature, then asking what the resulting feature should be (similar to the change feature type dialog), but I think it would be a lot easier to change the feature types to the final feature type first, then merge as normal.
On Tue, Sep 30, 2014 at 8:31 AM, Monica Munoz-Torres < notifications@github.com> wrote:
Well, not so fast.
This means we will not allow curators to EVER bring together a gene with a pseudogene?
A pseudogene annotation that has an mRNA child feature is not biologically sound, but that's our error, not the curator's. What happens when you start with a gene model as the starting hypothesis and then realize it is part of a pseudogene, either because you had already been working on one, or because you found biological evidence in support of it after you started the annotation? -- which, btw is what H. Robertson had been doing.
I disagree with Chris and Nathan, and I think we should allow merging of the two; THE CAVEAT is that if I am merging a gene to a pseudogene, the expected behaviour should always be that the entire model becomes a pseudogene and NO mRNA child is produced.
I know we currently do not allow curators to change their mind, but frankly, we should.
If the decision is made that this merge is not allowed (harrumph!), then we will have to be very clear with users that they are (still) not allowed to change their minds, and that if they need to bring a gene model as part of a pseudogene, they have to start from the beginning using with a pseudogene annotation instead. ... as in, I would need to add something about this on the user guide.
— Reply to this email directly or view it on GitHub https://github.com/GMOD/Apollo/issues/23#issuecomment-57306010.
— Reply to this email directly or view it on GitHub.
le sigh. fiiiiiiine, then only allow merges of the same type of features.
Allowing the change of feature types should be an option. And yes, we would have to investigate what happens to the sub-types -- I am assuming they would change to correspond.
~m.
Okay, this is whatI have so far (had not really explored all of the possibilities before):
Gene -> {A}
PseudoGene -> Transcript
transposable_element
repeat_region
A = mRNA (default), *RNA (t,sn,sno,nc,mi, r)
It looks like you can merge any of the transcripts together and it chooses one of the two types. transposable_element and repeat_region work as you would expect (no merges thank goodness). If I merge a gene into a transcript, that transcript is absolved / deleted as far as I can tell.
So . . . a proposal:
1- You can change Gene to Pseudogene and its sub-type will change to Transcript.
2 - If you change Pseudogene to Gene, its sub-type will change to an mRNA by default.
3 - However, you can change the type to any of the RNA's if a gene.
4 - You can only merge genes with other genes and pseudogenes with other pseudogenes. To merge a gene with a pseudogene see 1, above.
5 - You can merge a RNA with any other RNA, but first you have to change the type to match.
6 - How would you handle merges between a RNA and a gene/mRNA?
I feel like something is missing.
This should not be allowed.
Sent from my iPhone
On Sep 29, 2014, at 14:19, childers notifications@github.com wrote:
Should Web Apollo even allow merging of different types of features? Merging like features is very powerful, but merging not alike features can cause a lot of post-merge issues. For example, merging coding and non-coding features to make a non-coding feature would require the removal of CDS children.
— Reply to this email directly or view it on GitHub.
Moni has it right. Retract my last message
Sent from my iPhone
On Sep 30, 2014, at 05:31, Monica Munoz-Torres notifications@github.com wrote:
Well, not so fast.
This means we will not allow curators to EVER bring together a gene with a pseudogene?
A pseudogene annotation that has an mRNA child feature is not biologically sound, but that's our error, not the curator's. What happens when you start with a gene model as the starting hypothesis and then realize it is part of a pseudogene, either because you had already been working on one, or because you found biological evidence in support of it after you started the annotation? -- which, btw is what H. Robertson had been doing.
I disagree with Chris and Nathan, and I think we should allow merging of the two; THE CAVEAT is that if I am merging a gene to a pseudogene, the expected behaviour should always be that the entire model becomes a pseudogene and NO mRNA child is produced.
I know we currently do not allow curators to change their mind, but frankly, we should.
If the decision is made that this merge is not allowed (harrumph!), then we will have to be very clear with users that they are (still) not allowed to change their minds, and that if they need to bring a gene model as part of a pseudogene, they have to start from the beginning using with a pseudogene annotation instead. ... as in, I would need to add something about this on the user guide.
— Reply to this email directly or view it on GitHub.
This should be on Thursday's agenda. I don't think Moni nor I are getting our message across. Need audio.
Sent from my iPhone
On Sep 30, 2014, at 11:38, Nathan Dunn notifications@github.com wrote:
Okay, this is whatI have so far (had not really explored all of the possibilities before):
Gene -> {A} PseudoGene -> Transcript transposable_element repeat_region
A = mRNA (default), *RNA (t,sn,sno,nc,mi, r)
It looks like you can merge any of the transcripts together and it chooses one of the two types. transposable_element and repeat_region work as you would expect (no merges thank goodness). If I merge a gene into a transcript, that transcript is absolved / deleted as far as I can tell.
So . . . a proposal: 1- You can change Gene to Pseudogene and its sub-type will change to Transcript.
2 - If you change Pseudogene to Gene, its sub-type will change to an mRNA by default.
3 - However, you can change the type to any of the *RNA's if a gene.
4 - You can only merge genes with other genes and pseudogenes with other pseudogenes. To merge a gene with a pseudogene see 1, above. 5 - You can merge a RNA with any other RNA, but first you have to change the type to match.
6 - How would you handle merges between a *RNA and a gene/mRNA?
I feel like something is missing.
— Reply to this email directly or view it on GitHub.
I think that makes sense. Let’s talk about it on Thursday. I think there are a number of nuances with every solution that need to be discussed.
Nathan
On Sep 30, 2014, at 12:49 PM, selewis notifications@github.com wrote:
This should be on Thursday's agenda. I don't think Moni nor I are getting our message across. Need audio.
Sent from my iPhone
On Sep 30, 2014, at 11:38, Nathan Dunn notifications@github.com wrote:
Okay, this is whatI have so far (had not really explored all of the possibilities before):
Gene -> {A} PseudoGene -> Transcript transposable_element repeat_region
A = mRNA (default), *RNA (t,sn,sno,nc,mi, r)
It looks like you can merge any of the transcripts together and it chooses one of the two types. transposable_element and repeat_region work as you would expect (no merges thank goodness). If I merge a gene into a transcript, that transcript is absolved / deleted as far as I can tell.
So . . . a proposal: 1- You can change Gene to Pseudogene and its sub-type will change to Transcript.
2 - If you change Pseudogene to Gene, its sub-type will change to an mRNA by default.
3 - However, you can change the type to any of the *RNA's if a gene.
4 - You can only merge genes with other genes and pseudogenes with other pseudogenes. To merge a gene with a pseudogene see 1, above. 5 - You can merge a RNA with any other RNA, but first you have to change the type to match.
6 - How would you handle merges between a *RNA and a gene/mRNA?
I feel like something is missing.
— Reply to this email directly or view it on GitHub. — Reply to this email directly or view it on GitHub.
The procedure would be:
1 - select two features
2 - select “merge”
*3 - "merge" has submenu with both features and they will have the text (2 of these):
“Set Primary
e.g., “Set Primary mRNA abcd-1234 :: Gene abcd” “Set Primary transcript defgh-1234 :: Pseudogene defgh”
The “primary” will be the final-merged type, subtype, name, symbol, etc, with everything else merged. Does this sound right?
@nathandunn will do what he proposed on 2014-10-02 for 2.0
The ability to change the type of a feature after it has been dragged to Uc-A area should also be implemented, but will add an issue for 2.1. See #220
In an instance using the November Web Apollo release (at NAL/USDA), several pseudogene annotations have an mRNA child feature. This is not biologically sound - and frankly should not be possible (check SO).
Monica Poelchau found this out after the annotator conducted the modifications. Neither her nor I have not been able to reproduce it.