Closed MarineBergot closed 1 year ago
GFF is an absurdly general format and there are many ways to interpret the meaning of individual terms. Unfortunately there isn't a way to alter the GFF format that RepeatMasker generates, however a simple script could be written to translate it into whatever format you need. I am not familiar with what TEFinder needs in terms of the TARGET format, but I suspect you could prefix all the Target names with the "Motif:" prefix, add quotes around it and TEFinder may be happy.
Well yeah of course, changing the format is not my problem. But this file was generated with repeatmasker and this annotation as well. I need the name of the transposon otherwise the soft can't run then i was hoping that you have an Idea about which database was used to generate this annotation and that i could mixte the 2 informations to rebuild my gff
Oh! You specifically want to know where to find the sequence for a given target? Sure, the two examples you gave above are not transposons but simple repeats generated by TRF. In such cases, rather than a consensus identifier the TRF repeated unit is reported in parentheses and suffixed with an 'n'. For actual transposable elements this Target field would contain the consensus ID from the TE library used in the search. The identifier could be anything from a Repbase ID, Dfam ID, or any custom library ID provided to RepeatMasker. Let me know if that answers you question.
yeah sorry my question was probably not very clear ^^' yeah basically I have a list of transposons (like Bill (DQ446204.1), Gulliver (AF019750.1 and AF019751.1), MRC1 (DQ446210.1), Pioneer1 (U19367.1) etc.) and i would like to find them in the repeatmasked gff given for the new version of the Chlamydomonas genome (with no name inside, only this ID) Then with the name or NCBI id or sequence i can require on the Repbase ID or Dfam ID to try to get the id on the base and after find them in my gff? thanks!
Can you point me to the GFF file in question? I don't know what they used as a TE library when they generated that file so I really can't say what type of ID to expect (other than the Simple Repeat lines). If its Dfam accession numbers you can either look them up at the Dfam website here: https://www.dfam.org/browse, or translate them into names (if they have names) using the Dfam API:
# Example using curl and jq
% curl -s https://www.dfam.org/api/families/DF0000001 | jq '.name'
"MIR"
well apparently, according to the paper ( https://www.biorxiv.org/content/10.1101/2022.06.16.496473v1.full) they are using the Repbase (TE sequence was identified in each assembly by providing the latest Chlamydomonas repeat library to RepeatMasker v4.0.9 (Smit et al. 2013-2015). This library features updated consensus models for all Chlamydomonas repeats available in Repbase (https://www.girinst.org/repbase/)) it will be hard for me to point the gff because you need to register on website to have access to it but i can send it to you if needed. if i undestand well, to have access to Repbase you need to pay ?
That is helpful. So, if they are using Repbase all the Target fields in the GFF (for TE annotations) will refer to Repbase database entries. Repbase is not an open database so you will need to contact GIRI to obtain access to it -- wish it were not so but there is not much I can do about that.
Hi,
I have question. Is there a way to get the Target Motif from ID/Name. I just downloaded gff3 created with RepeatMasker and published by a team on new version of the genome of Chlamydomonas (v6 : https://www.biorxiv.org/content/10.1101/2022.06.16.496473v1.full) in the gff i have : chromosome_01 RepeatMasker similarity 2540266 2540351 30.1 + . ID=2545373.687;Name=(GGC)n;Target=(GGC)n 1 81;class=Simple_repeat chromosome_01 RepeatMasker similarity 2541035 2541117 27.2 + . ID=2545373.688;Name=(AGC)n;Target=(AGC)n 1 83;class=Simple_repeat
but i need something like that to run TEFinder: U_39 RepeatMasker similarity 13293 13473 23.7 - . Target "Motif:hAT210-short-Active" 248 428 U_39 RepeatMasker similarity 13616 13701 26.7 - . Target "Motif:hAT210-short-Active" 8 93 U_39 RepeatMasker similarity 13623 13732 25.7 - . Target "Motif:Fot1Active" 28 138
Is there a way to jump from on to another?
Thanks a lot !