alexdobin / STAR

RNA-seq aligner
MIT License
1.83k stars 503 forks source link

confusing error: "EXITING because of fatal PARAMETER error: --clip3pAdapterSeq has to contain 2 values to match the number of mates." #1260

Open malcook opened 3 years ago

malcook commented 3 years ago

I'm rerunning a pipeline after upgrading STAR to 2.9.4 and getting this error which used not to arise for the identical job.

It is unclear why this might now happen.

There is nothing in recent release notes that seems to pertain.

And the manual continues to read:

"--clip3pAdapterSeq default: - string(s): adapter sequences to clip from 3p of each mate. If one value is given, it will be assumed the same for both mates

re-running but with 2 values for clip3pAdapterSeq, --clip3pAdapterSeq CTGTCTCTTATACACATCT,CTGTCTCTTATACACATCT continues to exit with same message.

Full output log from second run follows:

TAR compilation time,server,dir=2021-05-04T09:43:56-0400 vega:/home/dobin/data/STAR/STARcode/STAR.master/source
##### Command Line:
/n/core/Bioinformatics/analysis/Piotrowski/JSandler/SCI-003911-GPFGRIZ/sequenceanalysis/src/bin/STAR --runThreadN 5 --genomeDir /home/mec/.local/crr/genome/Danio_rerio/GRCz11/annotatio\
n/102/STAR/101bp --genomeLoad LoadAndKeep --limitBAMsortRAM 20000000000 --readMapNumber 66666666666666 --readFilesIn /n/analysis/Piotrowski/js2223/MOLNG-2662/H5LY2BGXB/n_1_1_TAAGGCGA.f\
astq.gz,/n/analysis/Piotrowski/js2223/MOLNG-2662/H5LY2BGXB/n_2_1_TAAGGCGA.fastq.gz,/n/analysis/Piotrowski/js2223/MOLNG-2662/H5LY2BGXB/n_3_1_TAAGGCGA.fastq.gz,/n/analysis/Piotrowski/js2\
223/MOLNG-2662/H5LY2BGXB/n_4_1_TAAGGCGA.fastq.gz /n/analysis/Piotrowski/js2223/MOLNG-2662/H5LY2BGXB/n_1_2_TAAGGCGA.fastq.gz,/n/analysis/Piotrowski/js2223/MOLNG-2662/H5LY2BGXB/n_2_2_TAA\
GGCGA.fastq.gz,/n/analysis/Piotrowski/js2223/MOLNG-2662/H5LY2BGXB/n_3_2_TAAGGCGA.fastq.gz,/n/analysis/Piotrowski/js2223/MOLNG-2662/H5LY2BGXB/n_4_2_TAAGGCGA.fastq.gz --readFilesCommand \
zcat -f --clip3pAdapterSeq CTGTCTCTTATACACATCT,CTGTCTCTTATACACATCT --alignIntronMax -1 --alignIntronMin -1 --outFileNamePrefix ali/AC_LL_H_000.4. --outTmpDir /scratch/STAR-9492-DBgCQsj\
tKy --outTmpKeep None --outFilterMultimapNmax 1 --outFilterMultimapScoreRange 1 --outFilterMismatchNmax 3 --outSAMattributes All --outSAMtype BAM SortedByCoordinate --outSAMreadID Numb\
er --outWigType wiggle --outWigStrand Unstranded --outWigNorm RPM
##### Initial USER parameters from Command Line:
outFileNamePrefix                 ali/AC_LL_H_000.4.
outTmpDir                         /scratch/STAR-9492-DBgCQsjtKy
outTmpKeep                        None
###### All USER parameters from Command Line:
runThreadN                    5     ~RE-DEFINED
genomeDir                     /home/mec/.local/crr/genome/Danio_rerio/GRCz11/annotation/102/STAR/101bp     ~RE-DEFINED
genomeLoad                    LoadAndKeep     ~RE-DEFINED
limitBAMsortRAM               20000000000     ~RE-DEFINED
readMapNumber                 66666666666666     ~RE-DEFINED
readFilesIn                   /n/analysis/Piotrowski/js2223/MOLNG-2662/H5LY2BGXB/n_1_1_TAAGGCGA.fastq.gz,/n/analysis/Piotrowski/js2223/MOLNG-2662/H5LY2BGXB/n_2_1_TAAGGCGA.fastq.gz,/n/a\
nalysis/Piotrowski/js2223/MOLNG-2662/H5LY2BGXB/n_3_1_TAAGGCGA.fastq.gz,/n/analysis/Piotrowski/js2223/MOLNG-2662/H5LY2BGXB/n_4_1_TAAGGCGA.fastq.gz   /n/analysis/Piotrowski/js2223/MOLNG-\
2662/H5LY2BGXB/n_1_2_TAAGGCGA.fastq.gz,/n/analysis/Piotrowski/js2223/MOLNG-2662/H5LY2BGXB/n_2_2_TAAGGCGA.fastq.gz,/n/analysis/Piotrowski/js2223/MOLNG-2662/H5LY2BGXB/n_3_2_TAAGGCGA.fast\
q.gz,/n/analysis/Piotrowski/js2223/MOLNG-2662/H5LY2BGXB/n_4_2_TAAGGCGA.fastq.gz        ~RE-DEFINED
readFilesCommand              zcat   -f        ~RE-DEFINED
clip3pAdapterSeq              CTGTCTCTTATACACATCT,CTGTCTCTTATACACATCT        ~RE-DEFINED
alignIntronMax                18446744073709551615     ~RE-DEFINED
alignIntronMin                18446744073709551615     ~RE-DEFINED
outFileNamePrefix             ali/AC_LL_H_000.4.     ~RE-DEFINED
outTmpDir                     /scratch/STAR-9492-DBgCQsjtKy     ~RE-DEFINED
outTmpKeep                    None     ~RE-DEFINED
outFilterMultimapNmax         1     ~RE-DEFINED
outFilterMultimapScoreRange   1     ~RE-DEFINED
outFilterMismatchNmax         3     ~RE-DEFINED
outSAMattributes              All        ~RE-DEFINED
outSAMtype                    BAM   SortedByCoordinate        ~RE-DEFINED
outSAMreadID                  Number     ~RE-DEFINED
outWigType                    wiggle        ~RE-DEFINED
outWigStrand                  Unstranded        ~RE-DEFINED
outWigNorm                    RPM        ~RE-DEFINED
##### Finished reading parameters from all sources
##### Final user re-defined parameters-----------------:
runThreadN                        5
genomeDir                         /home/mec/.local/crr/genome/Danio_rerio/GRCz11/annotation/102/STAR/101bp
genomeLoad                        LoadAndKeep
readFilesIn                       /n/analysis/Piotrowski/js2223/MOLNG-2662/H5LY2BGXB/n_1_1_TAAGGCGA.fastq.gz,/n/analysis/Piotrowski/js2223/MOLNG-2662/H5LY2BGXB/n_2_1_TAAGGCGA.fastq.gz,\
/n/analysis/Piotrowski/js2223/MOLNG-2662/H5LY2BGXB/n_3_1_TAAGGCGA.fastq.gz,/n/analysis/Piotrowski/js2223/MOLNG-2662/H5LY2BGXB/n_4_1_TAAGGCGA.fastq.gz   /n/analysis/Piotrowski/js2223/MO\
LNG-2662/H5LY2BGXB/n_1_2_TAAGGCGA.fastq.gz,/n/analysis/Piotrowski/js2223/MOLNG-2662/H5LY2BGXB/n_2_2_TAAGGCGA.fastq.gz,/n/analysis/Piotrowski/js2223/MOLNG-2662/H5LY2BGXB/n_3_2_TAAGGCGA.\
fastq.gz,/n/analysis/Piotrowski/js2223/MOLNG-2662/H5LY2BGXB/n_4_2_TAAGGCGA.fastq.gz
readFilesCommand                  zcat   -f
readMapNumber                     66666666666666
limitBAMsortRAM                   20000000000
outFileNamePrefix                 ali/AC_LL_H_000.4.
outTmpDir                         /scratch/STAR-9492-DBgCQsjtKy
outTmpKeep                        None
outSAMtype                        BAM   SortedByCoordinate
outSAMattributes                  All
outSAMreadID                      Number
outWigType                        wiggle
outWigStrand                      Unstranded
outWigNorm                        RPM
outFilterMultimapNmax             1
outFilterMultimapScoreRange       1
outFilterMismatchNmax             3
clip3pAdapterSeq                  CTGTCTCTTATACACATCT,CTGTCTCTTATACACATCT
alignIntronMin                    18446744073709551615
alignIntronMax                    18446744073709551615

-------------------------------
##### Final effective command line:
/n/core/Bioinformatics/analysis/Piotrowski/JSandler/SCI-003911-GPFGRIZ/sequenceanalysis/src/bin/STAR   --runThreadN 5   --genomeDir /home/mec/.local/crr/genome/Danio_rerio/GRCz11/annot\
ation/102/STAR/101bp   --genomeLoad LoadAndKeep   --readFilesIn /n/analysis/Piotrowski/js2223/MOLNG-2662/H5LY2BGXB/n_1_1_TAAGGCGA.fastq.gz,/n/analysis/Piotrowski/js2223/MOLNG-2662/H5LY\
2BGXB/n_2_1_TAAGGCGA.fastq.gz,/n/analysis/Piotrowski/js2223/MOLNG-2662/H5LY2BGXB/n_3_1_TAAGGCGA.fastq.gz,/n/analysis/Piotrowski/js2223/MOLNG-2662/H5LY2BGXB/n_4_1_TAAGGCGA.fastq.gz   /n\
/analysis/Piotrowski/js2223/MOLNG-2662/H5LY2BGXB/n_1_2_TAAGGCGA.fastq.gz,/n/analysis/Piotrowski/js2223/MOLNG-2662/H5LY2BGXB/n_2_2_TAAGGCGA.fastq.gz,/n/analysis/Piotrowski/js2223/MOLNG-\
2662/H5LY2BGXB/n_3_2_TAAGGCGA.fastq.gz,/n/analysis/Piotrowski/js2223/MOLNG-2662/H5LY2BGXB/n_4_2_TAAGGCGA.fastq.gz      --readFilesCommand zcat   -f      --readMapNumber 66666666666666 \
  --limitBAMsortRAM 20000000000   --outFileNamePrefix ali/AC_LL_H_000.4.   --outTmpDir /scratch/STAR-9492-DBgCQsjtKy   --outTmpKeep None   --outSAMtype BAM   SortedByCoordinate      --\
outSAMattributes All      --outSAMreadID Number   --outWigType wiggle      --outWigStrand Unstranded      --outWigNorm RPM      --outFilterMultimapNmax 1   --outFilterMultimapScoreRang\
e 1   --outFilterMismatchNmax 3   --clip3pAdapterSeq CTGTCTCTTATACACATCT,CTGTCTCTTATACACATCT      --alignIntronMin 18446744073709551615   --alignIntronMax 18446744073709551615
----------------------------------------

Number of fastq files for each mate = 4

   Input read files for mate 1 :
-rwxrws--- 1 bioinfo fs_computationalbiology 304495794 Apr  6  2019 /n/analysis/Piotrowski/js2223/MOLNG-2662/H5LY2BGXB/n_1_1_TAAGGCGA.fastq.gz
-rwxrws--- 1 bioinfo fs_computationalbiology 306848608 Apr  6  2019 /n/analysis/Piotrowski/js2223/MOLNG-2662/H5LY2BGXB/n_2_1_TAAGGCGA.fastq.gz
-rwxrws--- 1 bioinfo fs_computationalbiology 303518242 Apr  6  2019 /n/analysis/Piotrowski/js2223/MOLNG-2662/H5LY2BGXB/n_3_1_TAAGGCGA.fastq.gz
-rwxrws--- 1 bioinfo fs_computationalbiology 310751703 Apr  6  2019 /n/analysis/Piotrowski/js2223/MOLNG-2662/H5LY2BGXB/n_4_1_TAAGGCGA.fastq.gz

   readsCommandsFile:
exec > "/scratch/STAR-9492-DBgCQsjtKy/tmp.fifo.read1"
echo FILE 0
zcat   -f      "/n/analysis/Piotrowski/js2223/MOLNG-2662/H5LY2BGXB/n_1_1_TAAGGCGA.fastq.gz"
echo FILE 1
zcat   -f      "/n/analysis/Piotrowski/js2223/MOLNG-2662/H5LY2BGXB/n_2_1_TAAGGCGA.fastq.gz"
echo FILE 2
zcat   -f      "/n/analysis/Piotrowski/js2223/MOLNG-2662/H5LY2BGXB/n_3_1_TAAGGCGA.fastq.gz"
echo FILE 3
zcat   -f      "/n/analysis/Piotrowski/js2223/MOLNG-2662/H5LY2BGXB/n_4_1_TAAGGCGA.fastq.gz"

   Input read files for mate 2 :
-rwxrws--- 1 bioinfo fs_computationalbiology 307414974 Apr  6  2019 /n/analysis/Piotrowski/js2223/MOLNG-2662/H5LY2BGXB/n_1_2_TAAGGCGA.fastq.gz
-rwxrws--- 1 bioinfo fs_computationalbiology 310573434 Apr  6  2019 /n/analysis/Piotrowski/js2223/MOLNG-2662/H5LY2BGXB/n_2_2_TAAGGCGA.fastq.gz
-rwxrws--- 1 bioinfo fs_computationalbiology 306525978 Apr  6  2019 /n/analysis/Piotrowski/js2223/MOLNG-2662/H5LY2BGXB/n_3_2_TAAGGCGA.fastq.gz
-rwxrws--- 1 bioinfo fs_computationalbiology 314450649 Apr  6  2019 /n/analysis/Piotrowski/js2223/MOLNG-2662/H5LY2BGXB/n_4_2_TAAGGCGA.fastq.gz

   readsCommandsFile:
exec > "/scratch/STAR-9492-DBgCQsjtKy/tmp.fifo.read2"
echo FILE 0
zcat   -f      "/n/analysis/Piotrowski/js2223/MOLNG-2662/H5LY2BGXB/n_1_2_TAAGGCGA.fastq.gz"
echo FILE 1
zcat   -f      "/n/analysis/Piotrowski/js2223/MOLNG-2662/H5LY2BGXB/n_2_2_TAAGGCGA.fastq.gz"
echo FILE 2
zcat   -f      "/n/analysis/Piotrowski/js2223/MOLNG-2662/H5LY2BGXB/n_3_2_TAAGGCGA.fastq.gz"
echo FILE 3
zcat   -f      "/n/analysis/Piotrowski/js2223/MOLNG-2662/H5LY2BGXB/n_4_2_TAAGGCGA.fastq.gz"

ParametersSolo: --soloCellFilterType CellRanger2.2 filtering parameters:  3000 0.99 10

EXITING because of fatal PARAMETER error: --clip3pAdapterSeq has to contain 2 values to match the number of mates.
SOLUTION: specify 2values in --clip3pAdapterSeq , for no clipping use -
Jun 05 18:40:02 ...... FATAL ERROR, exiting
alexdobin commented 3 years ago

Hi Malcolm,

yes, the behavior was changed in 2.7.8a, and it is mentioned in the CHANGES.md:

Cheers Alex

malcook commented 3 years ago

Ok, thanks, but,

  1. This is not documented in the manual and should be. Manual for 2.7.9a still reads “string(s): adapter sequences to clip from 3p of each mate. If one value is given, it will be assumed the same for both mates” and provides no example of “specifying the values for all read mates”
  2. anticipating this might have been the issue, I tried a few different ways for “specifying the values for all read mates” and could not figure it out for my case where I have multiple read1s and corresponding read2s. Can you give an example please? For example, if

readFilesIn=’a1.1.fq,a2.1.fq,a3.1.fq a1.2.fq,a2.2.fq,a3.2.fq’

should I then have

clip3pAdapterSeq="CTGTCTCTTATACACATCT,CTGTCTCTTATACACATCT,CTGTCTCTTATACACATCT”

Or maybe space delimited?

clip3pAdapterSeq="CTGTCTCTTATACACATCT CTGTCTCTTATACACATCT CTGTCTCTTATACACATCT”

Or maybe I need to specify it for both reads in the read pair, as

clip3pAdapterSeq="CTGTCTCTTATACACATCT,CTGTCTCTTATACACATCT,CTGTCTCTTATACACATCT CTGTCTCTTATACACATCT,CTGTCTCTTATACACATCT,CTGTCTCTTATACACATCT”

??

I’m guessing it’s the first (being the one I didn’t try yet).

alexdobin commented 3 years ago

Hi Malcolm,

will fix the manual, thanks! The values are space-separated and specified for read-1 and read-2 only. If you have multiple files for each mate, these values will be considered the same for all files.

Cheers alex

bapoorva commented 3 years ago

Hi Alex,

I have the exact same problem and I'm not entirely sure how to specify the adapter seq with --clip3pAdapterSeq . I tried comma-separated, space-separated, without and quotes and all of them return the same error

STAR --runThreadN 20  --limitBAMsortRAM  10000000000 --runMode alignReads --genomeDir ~/share/mm10_STAR --outSAMtype BAM SortedByCoordinate --clip3pAdapterSeq "AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC" --outReadsUnmapped Fastx --quantMode GeneCounts --outFilterType BySJout --outSAMattributes NH HI AS NM MD --outFilterMismatchNmax 999 --outFilterMismatchNoverReadLmax 0.04 --outFilterScoreMinOverLread 0.4 --outFilterMatchNminOverLread 0.4  --alignIntronMin 20   --alignIntronMax 1000000  --alignSJoverhangMin 8   --alignSJDBoverhangMin 1 --sjdbScore 1 --alignMatesGapMax 1000000 --readFilesCommand zcat --readFilesIn fastq/5-1264_R1_001.fastq.gz fastq/5-1264_R2_001.fastq.gz --outFileNamePrefix Klf5/STAR/

EXITING because of fatal PARAMETER error: --clip3pAdapterSeq has to contain 2 values to match the number of mates.
SOLUTION: specify 2values in --clip3pAdapterSeq , for no clipping use -

Thanks Apoorva

bapoorva commented 3 years ago

specifying --clip3pAdapterMMp parameter along with --clip3pAdapterSeq fixed the issue

alexdobin commented 3 years ago

Hi Apporva,

also, I would remove quotes: --clip3pAdapterSeq AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC, otherwise, it may consider it as one sequence.

Cheers Alex