TheJacksonLaboratory / splicing-pipelines-nf

Repository for the Anczukow-Lab splicing pipeline
14 stars 9 forks source link

Add Strandedness Function #314

Open angarb opened 2 years ago

angarb commented 2 years ago

Problem

It would be really helpful to have an output that records strandedness. We have been using the ReadsPerGene.out.tab from STAR quantmode, but this is not very high throughput.

@marina-yurieva has come up with a solution that uses Salmon

Her scripts are: /projects/anczukow-lab/scripts/strandness_salmon.sh

#!/bin/bash
#SBATCH -q batch
#SBATCH --nodes=1
#SBATCH --ntasks=20
#SBATCH --time=72:00:00

# use script: sbatch --chdir . --export=reads="PE",genome="human",fastq="myfastq",out="mydir" /projects/anczukow-lab/scripts/strandness_salmon.sh

module load slurm

if [ $reads == "PE" ]; then
        ls ${fastq}/*_R1*.f*gz > list1.txt
else
        ls ${fastq}/*.fastq.gz > list1.txt
fi

THREADS=`wc -l < list1.txt`

sbatch --chdir . --array=1-$THREADS --export=reads=$reads,genome=$genome,out=$out /projects/anczukow-lab/scripts/strandness_salmon.slurm

/projects/anczukow-lab/scripts/strandness_salmon.slurm

#!/bin/bash
#SBATCH -q batch
#SBATCH --nodes=1
#SBATCH --cpus-per-task=20
#SBATCH --mem-per-cpu=14GB
#SBATCH --time=72:00:00
#SBATCH --job-name=strandness_salmon
#SBATCH --output=strandness_salmon.%A_%a.out
#SBATCH --error=strandness_salmon.%A_%a.err

source /projects/compsci/USERS/yuriem/bin/anaconda3/etc/profile.d/conda.sh
conda activate salmon

if [[ $genome = "human" ]]; then

        index=$(echo "/projects/compsci/USERS/yuriem/bin/genomes/indexes/salmon_GRCh38/GRCh38_gencode_salmon_v1-4/")

elif [[ $genome = "mouse" ]]; then

        index=$(echo "/projects/compsci/USERS/yuriem/bin/genomes/indexes/salmon_mm10/mm10_salmon_v1-4")

else

        echo "We don't have the index for this genome"

fi

if [[ $reads = "PE" ]]; then

        read1=$(cat list1.txt | head -n $SLURM_ARRAY_TASK_ID | tail -n 1)

        read2=$(echo $read1 | sed 's/_R1/_R2/g' | sed 's/R1_/R2_/g' | sed 's/val_1/val_2/g' | sed 's/_1_val/_2_val/g' | sed 's/_1.f/_2.f/g')

        sample=$(echo $read1 | rev | cut -d / -f1 | cut -d _ -f2- | rev)

        salmon quant --threads 18 -i $index -l A -1 ${read1} -2 ${read2} -o ${out}/${sample}

        lib=$(grep "library type" ${out}/${sample}/logs/salmon_quant.log | awk '{print $NF}')

        echo -e ${sample} ' \t ' $lib >> ${out}/library_type_all_samples.txt
else
        singleread=$(cat list1.txt | head -n $SLURM_ARRAY_TASK_ID | tail -n 1)
        sample=$(echo $singleread | rev | cut -d / -f1 | cut -d _ -f2- | rev)
        salmon quant --threads 18 -i $index -l A -r ${singleread} -o ${out}/${sample}

        lib=$(grep "library type" ${out}/${sample}/logs/salmon_quant.log | awk '{print $NF}')

        echo -e ${sample} ' \t ' $lib >> ${out}/library_type_all_samples.txt
fi

This will require:

  1. Adding a Salmon container
  2. Remove the hard-coded Salmon indexes
  3. Integrate it into the pipeline