alexdobin / STAR

RNA-seq aligner
MIT License
1.86k stars 506 forks source link

-soloFeatures GeneFull available in regular STAR for bulk RNA-Seq? #1557

Open jdrnevich opened 2 years ago

jdrnevich commented 2 years ago

I have some bulk RNA-Seq data that I want to also count reads in introns as well as exons. Is the -soloFeatures GeneFull option only available in STARsolo or can the regular STAR do it as well?

alexdobin commented 2 years ago

Hi Jenny,

there is no direct option for bulk RNA-seq, but you can pretend that you have SmartSeq data with just one "cell", and get the --soloFeatures options working:

--soloType SmartSeq   
--soloFeatures GeneFull [and/or Gene,SJ,GeneFull_ExonOverIntron,GeneFull_Ex50pAS)]
--soloStrand Unstranded [or Forward or Reverse] 
--soloUMIdedup NoDedup [or Exact : deduplication based on alignment start/end]
--readFilesManifest manifest.txt

The manifest file should contain 3 tab-separated columns: paired-end reads: read1_file_name \tab read2_file_name \tab read_group_line. single-end reads: read1_file_name \tab - \tab read_group_line. Spaces, but not tabs are allowed in file names. If read_group_line does not start with ID:, it can only contain one ID field, and ID: will be added to it.

It will output the gene/cell count matrix for just one cell. You can also use it for multiple samples ("cells").

jdrnevich commented 2 years ago

Thanks for your quick reply! We will try this out and report back how it went.

jdrnevich commented 2 years ago

I am starting to work on this. If I am understanding correctly, I can put all 20 of my samples in the manifest file as separate "cells" and STARsolo will handle them appropriately? For bulk, we usually run a job array with: --readFilesIn ../data/raw-seq/${line}.fastq.gz \ --readFilesCommand zcat \. Can I still add the --readFilesCommand zcat or do I need to gunzip all my files first?

alexdobin commented 2 years ago

Hi Jenny,

you can use --readFilesCommand zcat for zipped files. --readFilesManifest simply replaces --readFilesIn and --outSAMattrRGline``` options for convenience.

Cheers Alex