Open darked89 opened 1 year ago
Thanks for your suggestion. We will consider adding an optional output file with a more up-to-date tsv format. The current output file format was designed when the first version of tRNAscan-SE was released over 25 years ago. Back then, graphical user interface was primitive and text files were served for data visualization and display. Because tRNAscan-SE has been integrated as part of the genome annotation pipelines at many genome centers, changing the output format will break a lot of the existing code. Therefore, we are still keeping the current file format.
Hello,
I have a modes proposal for the output file format improvements:
minus strand entries
tRNAscan-SE minus strand predictions in the output file have "tRNA Begin" > "tRNA End". Same goes for introns positions (if tRNA is spliced obviously). This is not an issue for the tRNAs themselves (BED files and fasta files have the correct 1:142656825-142656896 format/interval description) but the introns have to be flipped. Would it be easier to have a same, BED-like start-end-strand numbering scheme in the output?
extra spaces
To convert the output to a still human readable but easy to parse TSV I do:
Since you have a complicated header in the file I understand the need for the spaces. Which brings me to the next point
header / TSV
TSV format with named columns seem to be the default. With comment lines
#
on the top it could be even easier to understand than the current one and certainly easier to parse. For example:in order to fix minus strand issue the "strand" should be inserted somewhere.
These are just my 0.02$
Thank you for developing and maintaining tRNAScan-SE.
Darek Kedra