alexdobin / STAR

RNA-seq aligner
MIT License
1.86k stars 506 forks source link

Using --readFilesManfest for multi-lane data? #1544

Open stacyhung opened 2 years ago

stacyhung commented 2 years ago

Hi,

Is there a way to use the --readFilesManifest option with --runMode alignReads, such that for a sample sequenced on multiple lanes with paired-end reads (e.g. two R1 and two R2 reads for sampleX), the result is a single BAM?

So the bare-bones command would need to look something like this? STAR --runMode alignReads --readFilesIn sampleX_L1_R1.fastq.gz,sampleX_L2_R1.fastq.gz sampleX_L1_R2.fastq.gz,sampleX_L2_R2.fastq.gz --readFilesManifest manifest.txt

This not a SmartSeq experiment and each fastq is fairly large (1-2GB). Or would you recommend to just merge the fastqs prior to running STAR?

Thanks, Stacy

alexdobin commented 2 years ago

Hi Stacy,

yes, you can use the --readFilesManifest - instead of --readFilesIn Simply list all file pairs and an "read" group in the manifest file, and they will processed into one BAM

llrs commented 10 months ago

Many thanks @alexdobin for this great tool and for helping users!

For anyone testing/using this, each lane data should be on their own row in the manifest file (I tried to supply them as in the command line comma separated and it didn't work):

FASTQ/S1_lane1_1.fastq.gz   FASTQ/S1_lane1_2.fastq.gz   S1
FASTQ/S2_lane1_1.fastq.gz   FASTQ/S2_lane1_2.fastq.gz   S2
FASTQ/S3_lane1_1.fastq.gz   FASTQ/S3_lane1_2.fastq.gz   S3
FASTQ/S1_lane2_1.fastq.gz   FASTQ/S1_lane2_2.fastq.gz   S1
FASTQ/S2_lane2_1.fastq.gz   FASTQ/S2_lane2_2.fastq.gz   S2
FASTQ/S3_lane2_1.fastq.gz   FASTQ/S3_lane2_2.fastq.gz   S3