Add Strandedness Function

Problem

It would be really helpful to have an output that records strandedness. We have been using the ReadsPerGene.out.tab from STAR quantmode, but this is not very high throughput.

@marina-yurieva has come up with a solution that uses Salmon

Her scripts are: /projects/anczukow-lab/scripts/strandness_salmon.sh

#!/bin/bash
#SBATCH -q batch
#SBATCH --nodes=1
#SBATCH --ntasks=20
#SBATCH --time=72:00:00

# use script: sbatch --chdir . --export=reads="PE",genome="human",fastq="myfastq",out="mydir" /projects/anczukow-lab/scripts/strandness_salmon.sh

module load slurm

if [ $reads == "PE" ]; then
        ls ${fastq}/*_R1*.f*gz > list1.txt
else
        ls ${fastq}/*.fastq.gz > list1.txt
fi

THREADS=`wc -l < list1.txt`

sbatch --chdir . --array=1-$THREADS --export=reads=$reads,genome=$genome,out=$out /projects/anczukow-lab/scripts/strandness_salmon.slurm

/projects/anczukow-lab/scripts/strandness_salmon.slurm

#!/bin/bash
#SBATCH -q batch
#SBATCH --nodes=1
#SBATCH --cpus-per-task=20
#SBATCH --mem-per-cpu=14GB
#SBATCH --time=72:00:00
#SBATCH --job-name=strandness_salmon
#SBATCH --output=strandness_salmon.%A_%a.out
#SBATCH --error=strandness_salmon.%A_%a.err

source /projects/compsci/USERS/yuriem/bin/anaconda3/etc/profile.d/conda.sh
conda activate salmon

if [[ $genome = "human" ]]; then

        index=$(echo "/projects/compsci/USERS/yuriem/bin/genomes/indexes/salmon_GRCh38/GRCh38_gencode_salmon_v1-4/")

elif [[ $genome = "mouse" ]]; then

        index=$(echo "/projects/compsci/USERS/yuriem/bin/genomes/indexes/salmon_mm10/mm10_salmon_v1-4")

else

        echo "We don't have the index for this genome"

fi

if [[ $reads = "PE" ]]; then

        read1=$(cat list1.txt | head -n $SLURM_ARRAY_TASK_ID | tail -n 1)

        read2=$(echo $read1 | sed 's/_R1/_R2/g' | sed 's/R1_/R2_/g' | sed 's/val_1/val_2/g' | sed 's/_1_val/_2_val/g' | sed 's/_1.f/_2.f/g')

        sample=$(echo $read1 | rev | cut -d / -f1 | cut -d _ -f2- | rev)

        salmon quant --threads 18 -i $index -l A -1 ${read1} -2 ${read2} -o ${out}/${sample}

        lib=$(grep "library type" ${out}/${sample}/logs/salmon_quant.log | awk '{print $NF}')

        echo -e ${sample} ' \t ' $lib >> ${out}/library_type_all_samples.txt
else
        singleread=$(cat list1.txt | head -n $SLURM_ARRAY_TASK_ID | tail -n 1)
        sample=$(echo $singleread | rev | cut -d / -f1 | cut -d _ -f2- | rev)
        salmon quant --threads 18 -i $index -l A -r ${singleread} -o ${out}/${sample}

        lib=$(grep "library type" ${out}/${sample}/logs/salmon_quant.log | awk '{print $NF}')

        echo -e ${sample} ' \t ' $lib >> ${out}/library_type_all_samples.txt
fi

This will require:

Adding a Salmon container
Remove the hard-coded Salmon indexes
Integrate it into the pipeline

TheJacksonLaboratory / splicing-pipelines-nf

Add Strandedness Function #314

Problem