CCRGeneticsBranch / khanlab_ngs_pipeline

0 stars 1 forks source link

STARens error related to PICCARD OUTPUT #9

Closed slsevilla closed 1 year ago

slsevilla commented 1 year ago

Example Error Log

/data/khanlab2/processed_DATA/log/STARens.52569355.e

Example error

[Sat Nov 12 14:09:52 2022]
rule STARens:
    input: /data/khanlab/projects/DATA/Sample_RMS2436_T1R_TM_AAAT2K2HV_AAAT322HV/Sample_RMS2436_T1R_TM_AAAT2K2HV_AAAT322HV_R1.fastq.gz, /data/khanlab/projects/DATA/Sample_RMS2436_T1R_TM_AAAT2K2HV_AAAT322HV/Sample_RMS2436_T1R_TM_AAAT2K2HV_AAAT322HV_R2.fastq.gz, RMS2436/RMS2436/FQ/RMS2436_T1R_TM_AAAT2K2HV_AAAT322HV_R1.fastq.gz, RMS2436/RMS2436/FQ/RMS2436_T1R_TM_AAAT2K2HV_AAAT322HV_R2.fastq.gz
    output: RMS2436/RMS2436/RMS2436_T1R_TM_AAAT2K2HV_AAAT322HV/RMS2436_T1R_TM_AAAT2K2HV_AAAT322HV.star.bam, RMS2436/RMS2436/RMS2436_T1R_TM_AAAT2K2HV_AAAT322HV/RMS2436_T1R_TM_AAAT2K2HV_AAAT322HV.star.bam.bai, RMS2436/RMS2436/RMS2436_T1R_TM_AAAT2K2HV_AAAT322HV/RMS2436_T1R_TM_AAAT2K2HV_AAAT322HVChimeric.out.junction, RMS2436/RMS2436/RMS2436_T1R_TM_AAAT2K2HV_AAAT322HV/RMS2436_T1R_TM_AAAT2K2HV_AAAT322HV_ENS_SJ.out.tab.bed.gz, RMS2436/RMS2436/RMS2436_T1R_TM_AAAT2K2HV_AAAT322HV/RMS2436_T1R_TM_AAAT2K2HV_AAAT322HV.ENS_transcriptome.bam
    jobid: 0
    wildcards: subject=RMS2436, TIME=RMS2436, sample=RMS2436_T1R_TM_AAAT2K2HV_AAAT322HV

        #########################################
        # set LOCAL
        cd ${LOCAL}/

        # set tmp dir       
        if [[ -d "/lscratch/$SLURM_JOB_ID" ]]; then 
            TMPDIR="/lscratch/$SLURM_JOB_ID"
        else
            TMPDIR=tmp/star_tmp
            if [[ ! -d $TMPDIR ]]; then mkdir -p $TMPDIR; fi
        fi

        # Run STAR
        STAR    --genomeDir /data/khanlab/projects/ngs_pipeline_testing/References_4.0/New_GRCh37/Index/STAR_2.7.8a         --readFilesIn /data/khanlab/projects/DATA/Sample_RMS2436_T1R_TM_AAAT2K2HV_AAAT322HV/Sample_RMS2436_T1R_TM_AAAT2K2HV_AAAT322HV_R1.fastq.gz /data/khanlab/projects/DATA/Sample_RMS2436_T1R_TM_AAAT2K2HV_AAAT322HV/Sample_RMS2436_T1R_TM_AAAT2K2HV_AAAT322HV_R2.fastq.gz       --readFilesCommand zcat         --outFileNamePrefix $TMPDIR/RMS2436_T1R_TM_AAAT2K2HV_AAAT322HV_ENS  --runThreadN ${THREADS}         --twopassMode Basic         --outSAMunmapped Within         --chimSegmentMin 12         --chimJunctionOverhangMin 12        --alignSJDBoverhangMin 10   --alignMatesGapMax 100000       --alignIntronMax 100000         --chimSegmentReadGapMax 3       --outFilterMismatchNmax 2       --outSAMtype BAM Unsorted       --quantMode TranscriptomeSAM

        echo "STAR ENS mapping completed"

        # sort file
        samtools sort -m 150G -T $TMPDIR $TMPDIR/RMS2436_T1R_TM_AAAT2K2HV_AAAT322HV_ENSAligned.out.bam -o $TMPDIR/RMS2436_T1R_TM_AAAT2K2HV_AAAT322HV_ENSAligned.sortedByCoord.out.bam

        # MOVE JUNCTIONS FILE
        mv -f $TMPDIR/RMS2436_T1R_TM_AAAT2K2HV_AAAT322HV_ENSChimeric.out.junction /data/khanlab2/processed_DATA//RMS2436/RMS2436/RMS2436_T1R_TM_AAAT2K2HV_AAAT322HV/RMS2436_T1R_TM_AAAT2K2HV_AAAT322HVChimeric.out.junction

        # MOVE TRANSCRIPTOME BAM
        mv -f $TMPDIR/RMS2436_T1R_TM_AAAT2K2HV_AAAT322HV_ENSAligned.toTranscriptome.out.bam /data/khanlab2/processed_DATA//RMS2436/RMS2436/RMS2436_T1R_TM_AAAT2K2HV_AAAT322HV/RMS2436_T1R_TM_AAAT2K2HV_AAAT322HV.ENS_transcriptome.bam

        #MOVE TAB
        mv -f $TMPDIR/RMS2436_T1R_TM_AAAT2K2HV_AAAT322HV_ENSSJ.out.tab /data/khanlab2/processed_DATA//RMS2436/RMS2436/RMS2436_T1R_TM_AAAT2K2HV_AAAT322HV/RMS2436_T1R_TM_AAAT2K2HV_AAAT322HV_ENS_SJ.out.tab

        # RUN PICARD
        java -Xmx75g -Djava.io.tmpdir=$TMPDIR -jar $PICARD_JAR AddOrReplaceReadGroups VALIDATION_STRINGENCY=SILENT INPUT=$TMPDIR/RMS2436_T1R_TM_AAAT2K2HV_AAAT322HV_ENSAligned.sortedByCoord.out.bam /data/khanlab2/processed_DATA//RMS2436/RMS2436/RMS2436_T1R_TM_AAAT2K2HV_AAAT322HV/RMS2436_T1R_TM_AAAT2K2HV_AAAT322HV.star.bam SORT_ORDER=coordinate RGLB=RMS2436_T1R_TM_AAAT2K2HV_AAAT322HV RGPU=RMS2436_T1R_TM_AAAT2K2HV_AAAT322HV RGPL=ILLUMINA RGSM=RMS2436_T1R_TM_AAAT2K2HV_AAAT322HV RGCN=khanlab

        # index
        samtools index /data/khanlab2/processed_DATA//RMS2436/RMS2436/RMS2436_T1R_TM_AAAT2K2HV_AAAT322HV/RMS2436_T1R_TM_AAAT2K2HV_AAAT322HV.star.bam

        # filt tabix
        awk -F"\t" 'BEGIN{OFS="\t"}{strand=($4==1)?"+":"-";annotated=($6==1)?"true":"false";if($5==0) motif="non-canonical"; if($5==1)motif="GT/AG";if($5==2)motif="CT/AC";if($5==3)motif="GC/AC";if($5==4)motif="CT/GC";if($5==5)motif="AT/AC";if($5==6)motif="GT/AT";print $1,$2,$3,"motif="motif";uniquely_mapped="$7";multi_mapped="$8";maximum_spliced_alignment_overhang="$9";annotated_junction="annotated,$7,strand}' /data/khanlab2/processed_DATA//RMS2436/RMS2436/RMS2436_T1R_TM_AAAT2K2HV_AAAT322HV/RMS2436_T1R_TM_AAAT2K2HV_AAAT322HV_ENS_SJ.out.tab |bedtools sort |bgzip > /data/khanlab2/processed_DATA//RMS2436/RMS2436/RMS2436_T1R_TM_AAAT2K2HV_AAAT322HV/RMS2436_T1R_TM_AAAT2K2HV_AAAT322HV_ENS_SJ.out.tab.bed.gz
        tabix -0 -p bed /data/khanlab2/processed_DATA//RMS2436/RMS2436/RMS2436_T1R_TM_AAAT2K2HV_AAAT322HV/RMS2436_T1R_TM_AAAT2K2HV_AAAT322HV_ENS_SJ.out.tab.bed.gz
        ##########################################

Activating environment modules: STAR/2.7.8a, bedtools/2.22.0, picard/1.129, samtools/0.1.19, bcftools/1.13
[-] Unloading snakemake  5.24.1 
[+] Loading STAR  2.7.8a 
[+] Loading bedtools  2.22.0 
[+] Loading gcc  9.2.0  ... 
[-] Unloading gcc  9.2.0  ... 
[+] Loading gcc  9.2.0  ... 
[+] Loading openmpi 4.0.5  for GCC 9.2.0 
[+] Loading ImageMagick  7.0.8  on cn3244 
[+] Loading HDF5  1.10.4 
[-] Unloading gcc  9.2.0  ... 
[+] Loading gcc  9.2.0  ... 
[+] Loading NetCDF 4.7.4_gcc9.2.0 
[+] Loading pandoc  2.17.1.1  on cn3244 
[+] Loading pcre2 10.21  ... 
[+] Loading R 4.2.0 
[+] Loading picard  1.129 
[+] Loading samtools 0.1.19  ... 
[+] Loading samtools 1.13  ... 
ERROR: Invalid argument '/data/khanlab2/processed_DATA//RMS2436/RMS2436/RMS2436_T1R_TM_AAAT2K2HV_AAAT322HV/RMS2436_T1R_TM_AAAT2K2HV_AAAT322HV.star.bam'.

USAGE: AddOrReplaceReadGroups [options]

Documentation: http://broadinstitute.github.io/picard/command-line-overview.html#AddOrReplaceReadGroups

Replaces all read groups in the INPUT file with a single new read group and assigns all reads to this read group in the OUTPUT BAM
Version: 1.129(b508b2885562a4e932d3a3a60b8ea283b7ec78e2_1424706677)

Solution

        java -Xmx75g -Djava.io.tmpdir=$TMPDIR -jar $PICARD_JAR AddOrReplaceReadGroups VALIDATION_STRINGENCY=SILENT INPUT=$TMPDIR/{wildcards.sample}_ENSAligned.sortedByCoord.out.bam OUTPUT={params.home}/{wildcards.subject}/{TIME}/{wildcards.sample}/{wildcards.sample}.star.bam SORT_ORDER=coordinate RGLB={wildcards.sample} RGPU={wildcards.sample} RGPL=ILLUMINA RGSM={wildcards.sample} RGCN=khanlab
slsevilla commented 1 year ago

Fixed with pull request https://github.com/CCRGeneticsBranch/ngs_pipeline_4.2/pull/10