Closed BioWu closed 6 years ago
Hi @BioWu. Thanks for reporting this! I won't have time to look at this carefully until next week, but from what I can tell you are correct and this should be fixed. Thanks for spotting this and finding a solution. I will follow-up soon with a resolution.
We have resolved this issue as part of a larger update to the transposon repository which converted the EMBL file to FASTA + GFF3: https://github.com/bergmanlab/transposons/issues/12. Please use the most recent version of the D_mel_transposon_sequence_set.fa available here: https://github.com/bergmanlab/transposons/tree/master/releases/. Thanks for your patience and bringing this issue to our attention.
Thanks for all your efforts in establishing such wonderful TE references for Drosophila. I tried to use the file
transposons/misc/D_mel_transposon_sequence_set.fa
in genome masking. However, I found a few bugs in this file. The question was that sequences in this file of several elements such as rooA etc had strange sequence, which was extremely short than sequence provided in raw EMBL file. I checked and found that there existed non-ATCG characters in its seqeuence. And the sequence was just broken down at such sites. That's why several sequences were shorter in this file. These strange chrs: r|y|w|n|s (low and up case). Hope this could be helpful.