alexdobin / STAR

RNA-seq aligner
MIT License
1.87k stars 506 forks source link

Paired-end sample aligned as multiplexed single-end sample #928

Open YeastCell opened 4 years ago

YeastCell commented 4 years ago

Hi Alex,

I'm working on application where I need to align paired-end reads independently and I currently do that by spawning two STAR processes. However, I was wondering if I could somehow use multiplexed alignment mode where single-end reads are provided as comma-separated files and can be aligned in one go? This seems to work nice, but I only lack one piece there: finding a way to (efficiently) split each output into two files after the run is completed – one coming from the first mate and the other coming from the second mate. I'm asking this because maybe there is some magic option that I'm not aware of that could look like --outFilesSplit and do all this crazy work for me? :D

Many thanks, A.

alexdobin commented 4 years ago

Hi Ana,

are you talking about splitting the output BAM files? One possibility is to add file-specific ReadGroup tags to all reads with --outSAMattrRGline ID:file1 , ID:file2 , ID:file3 Note that this list is separated by commas surrounded by spaces (unlike --readFilesIn). It will generate one output BAM file, but you can split by ReadGroup it with samtools split command.

On the other hand, if you want to split all outputs (Log.final.out, SJ.out.tab, etc), you would have to run separate jobs.

Cheers Alex