bergmanlab / drosophila-transposons

Drosophila transposable element canonical sequences
Creative Commons Zero v1.0 Universal
24 stars 11 forks source link

About the file: transposons/misc/D_mel_transposon_sequence_set.fa #28

Closed BioWu closed 6 years ago

BioWu commented 6 years ago

Thanks for all your efforts in establishing such wonderful TE references for Drosophila. I tried to use the file transposons/misc/D_mel_transposon_sequence_set.fa in genome masking. However, I found a few bugs in this file. The question was that sequences in this file of several elements such as rooA etc had strange sequence, which was extremely short than sequence provided in raw EMBL file. I checked and found that there existed non-ATCG characters in its seqeuence. And the sequence was just broken down at such sites. That's why several sequences were shorter in this file. These strange chrs: r|y|w|n|s (low and up case). Hope this could be helpful.

cbergman commented 6 years ago

Hi @BioWu. Thanks for reporting this! I won't have time to look at this carefully until next week, but from what I can tell you are correct and this should be fixed. Thanks for spotting this and finding a solution. I will follow-up soon with a resolution.

cbergman commented 6 years ago

We have resolved this issue as part of a larger update to the transposon repository which converted the EMBL file to FASTA + GFF3: https://github.com/bergmanlab/transposons/issues/12. Please use the most recent version of the D_mel_transposon_sequence_set.fa available here: https://github.com/bergmanlab/transposons/tree/master/releases/. Thanks for your patience and bringing this issue to our attention.