The-Sequence-Ontology / SO-Ontologies

Collect of SO Ontologies
Creative Commons Attribution 4.0 International
94 stars 37 forks source link

repeat_fragment [sf#210] #210

Open srynobio opened 9 years ago

srynobio commented 9 years ago

Reported by scalabrin on 2010-06-22 13:06 UTC Dear curator,

I checked the documentation and the mailing list, in particular this: http://sourceforge.net/tracker/index.php?func=detail&aid=1720110&group\_id=72703&atid=810408

but I am still unable to write gff3 files with nested repeats. For example, let us assume that A_frag1..B..A_frag2 is a repetitive region with repeat B nested inside repeat A. How do I specify it?

I would like to use repeat_fragment but I see it is a topologically_defined_region which is at the same level in the tree of biological_region that includes the feature I need (e.g. transposable_element)

I repeat, it is probably me who is unable to understand the logic behind it, but how whould you describe the situation I plotted out (or even more complicated one with a region nested in a region that is nested in another region, and so on...)

srynobio commented 9 years ago

Response from @keilbeck

Hi Simone, Thanks. I think you have found some problems. The initial repeat fragment work was abandoned and then got rolled up in the topological terms. It should not be there. It needs to be biological. Let me try to draw what I think it is you want: |--------Afrag-------------||---B----------------||-------------A-frag-----| [repeat-fragment-----][repeat-region][repeat-fragment---] [-------------------------------repeat-region-------------------------------] repeat-fragment part_of repeat_region This would allow you to build up the parts of your nested repeats, regardless of how complex they are, I think. Let me know if it works, and I'll make the changes. I leave for vacation on friday though, so lets try to get it done before then. Cheers, Sorry if the lines don't match up... Karen

srynobio commented 9 years ago

Response from @keilbeck

I went ahead and moved some things around to make this workable for you. I added nested repeat as a kind of repeat region. repeat_fragment part of nested repeat Let me know if this solves your problem. (I also did the same thing for transposon fragment.) It mirrors the way transcripts with introns are handled. --Karen

srynobio commented 9 years ago

Response from Simone Scalabrin

Dear Karen, I apologize I haven't answered sooner but I just realized I got an answer from you (I was waiting for an email, my fault!). Thanks for the solution. I find it clear and nice. But now I add a further complication and nesting problem: what if I want to annotate the LTRs of repeat A? For the moment let us suppose that they are one on the first fragment and the other on the second fragment. My solution would be splitting the problem into two subproblems: first annotate the repeat region and its LTRs, both for A and B (in this case A would include repeat B), second, annotate repeat A with the nested structure. E.g. |--------Afrag-------------||---B----------------||-------------A-frag-----| with LTR_A1 and LTR_A2 at the ends of the A-fragments:

gff-version 3

seq_name program nested_repeat 84000 89000 . . . ID=ID1;Name=ID1 seq_name program direct_repeat 84000 84500 . + . Parent=ID1 seq_name program direct_repeat 88500 89000 . + . Parent=ID1 seq_name program repeat_fragment 84000 85000 . + . Parent=ID1 seq_name program repeat_fragment 88000 89000 . + . Parent=ID1 seq_name program repeat_region 85001 87999 . . . ID=ID2;Name=ID2 seq_name program direct_repeat 85001 85500 . + . Parent=ID2 seq_name program direct_repeat 87500 87999 . + . Parent=ID2 Does that sound ok to you? We can then think of repeat B inserting in on of A LTRs, but the solution would be very similar with three direct_repeats (where two would implicitely be fragments) for element A. Or would you prefer to make fragments also for direct_repeats and similar features?

srynobio commented 9 years ago

Dear Karen, I have also a further solution to the previous problem (sorry if lanes are broken): ||---LTR_A1---||---repeat_fragment_A1---||---B---||---repeat_fragment_A2---||---LTR_A2---|| with LTRs and repeat fragments not overlapping and their union giving the whole repeat A

gff-version 3

seq_name program nested_repeat 84000 89000 . . . ID=ID1;Name=ID1 seq_name program direct_repeat 84000 84500 . + . Parent=ID1 seq_name program repeat_fragment 84501 85000 . + . Parent=ID1 seq_name program repeat_fragment 88000 88499 . + . Parent=ID1 seq_name program direct_repeat 88500 89000 . + . Parent=ID1 seq_name program repeat_region 85001 87999 . . . ID=ID2;Name=ID2 seq_name program direct_repeat 85001 85500 . + . Parent=ID2 seq_name program direct_repeat 87500 87999 . + . Parent=ID2 Gosh, just surfing a previous discussion we had (ID: 2990487), I just realized that the direct_repeat is still not part of a repeat_region (as you suggested in that discussion). And actually, in my case, regarding LTR retrotransposons, I should use LTR and LTR_retrotransposon instead of, respectively, direct_repeat and repeat_region/nested_repeat; but then all parent relationships will fail :-( I would say that all the terms I am talking about (and many more, as all transposon related ones) are actually repeat_region(s). Could you, please, think of the problem? In this case it looks like the structure needs to be severely changed. Best regards, Simone

srynobio commented 9 years ago

Hi Simone OK, I have thought about this all afternoon, and I think that the best way to deal with this, both for repeat_region and for transposable_element, is for the fragment to be a part_of child of the highest term appropriate. That way any of the subclasses of the term may contain a fragment. So how about this: repeat_region (p) repeat_fragment transposable_element (p) transposon_fragment The other option is to make a kind of fragment for every kind of repeat which is a never ending process. The thing about this that we need to think about is the annotation. You would have to annotate the whole thing including the insert with the parent term ie LTR_retrotransposon. You could optionally also annotated the same sequence as a nested_transposon. You would then annotate the insert and the fragments. Does this work for the software? Can you annotate the same region with two tags? I also see the need to make transposable_element a kind of repeat_region, but lets discuss that later. If you want to set up a conf call, skype, or email let me know. --Karen

srynobio commented 9 years ago

Hi Karen, thanks for thinking of it. Ok, fine not to complicate things too much, I agree. The proposed solution is almost ok but I would like to use gff3 files to provide to gbrowse. Unfortunately gbrowse does not handle overlapping boxes as I would like, moreover having overlapping boxes not related by a part_of relationship makes it a bit hard to compute total length and statistics of single components, therefore I prefer the solution I gave on the previous comment, with repeat_fragment(s) and direct_repeat(s) summing up to form a repeat Actually I am a bit confused by the direct_repeat relationship: here http://song.cvs.sourceforge.net/viewvc/song/ontology/so-xp.obo?revision=1.236&view=markup I find it is not in the part_of relationship with repeat_region, it is just in the is_a relationship, while here http://www.sequenceontology.org/miso/current_release/term/SO:0000314 it is a child of repeat_region... In case you want to set up a conf call to discuss further you can contact me via email, otherwise you can close the thread. Thanks again, Simone