I've noticed that when I run STAR with -outReadsUnmapped Fastx on Illumina reads that rather than just appending the mapping status of the read and the mate, the final field of the original Illumina header is modified in a way that removes the index sequence. It would be very nice to retain the index sequence in the headers of the unmapped reads.
Example of a read in the raw data and in the Unmapped file:
raw data:
@NS500540:129:HKJG2BGX7:4:13402:14458:19861 1:N:0:CGTAAG
unmapped file:
@NS500540:129:HKJG2BGX7:4:13402:14458:19861 0:N: 00
I have only tested this with STAR 2.7.9a but I didn't see anything about this issue in the changelogs of subsequent releases or in the issue tracker.
Why this would be useful: I ran many samples through STAR at the same time, which were only disambiguated by the index sequence. If I had access to the index sequence, I could easily identify which sample the unmapped read came from; right now I can only pinpoint which sequencing run it was from based on the name conventions in Illumina FASTQ headers.
ENHANCEMENT REQUEST:
I've noticed that when I run STAR with
-outReadsUnmapped Fastx
on Illumina reads that rather than just appending the mapping status of the read and the mate, the final field of the original Illumina header is modified in a way that removes the index sequence. It would be very nice to retain the index sequence in the headers of the unmapped reads.Example of a read in the raw data and in the Unmapped file:
I have only tested this with STAR 2.7.9a but I didn't see anything about this issue in the changelogs of subsequent releases or in the issue tracker.
Why this would be useful: I ran many samples through STAR at the same time, which were only disambiguated by the index sequence. If I had access to the index sequence, I could easily identify which sample the unmapped read came from; right now I can only pinpoint which sequencing run it was from based on the name conventions in Illumina FASTQ headers.