DaehwanKimLab / hisat2

Graph-based alignment (Hierarchical Graph FM index)
GNU General Public License v3.0
464 stars 113 forks source link

`--read-lengths arg must be at least 20` (HISAT2 version 2.2.0) #245

Closed kubu4 closed 4 years ago

kubu4 commented 4 years ago

I'm getting the following error when attempting to perform an alignment:

--read-lengths arg must be at least 20
HISAT2 version 2.2.0 by Daehwan Kim (infphilo@gmail.com, www.ccb.jhu.edu/people/infphilo)
Usage:

### I've removed the usage instructions, for brevity

Error: Encountered internal HISAT2 exception (#1)
Command: /gscratch/srlab/programs/hisat2-2.2.0/hisat2-align-s --wrapper basic-0 --threads 28 -x cbai_transcriptome_v2.0.fasta -q -S cbai_transcriptome_v2.0.fasta.sam --read-lengths 150,67,69,71,57,79,43,65,75,73,61,59,63,77,41,49,55,81,83,47,39,53,45,37,89,87,85,51,91,97,82,93,58,35,99,95,78,101,68,90,103,88,76,70,66,72,105,74,42,86,115,109,100,92,48,98,96,110,54,64,56,46,31,94,62,33,107,102,80,104,123,60,117,44,40,113,106,84,50,141,119,108,120,114,121,111,138,134,127,118,38,125,124,52,133,128,112,149,142,145,143,140,126,148,132,27,135,139,131,36,129,144,137,122,116,147,146,130,32,136,21,25,34,29,23,30,28,19,24,17,22,15

### I've removed the list of -1 and -2 FastQ files, it's a very lengthy list of input files

(ERR): hisat2-align exited with value 1

The command I used to run this is listed below, but I'm looping through a set of transcriptomes and things ran without issue on the four previous transcriptomes. The only difference would be the set of FastQ files that are being aligned and the input transcriptome. Transcriptome indexing appears to complete successfully.

Thanks for any help. Here's the script I'm running:

#!/bin/bash
## Job Name
#SBATCH --job-name=cbai_hisat2_transcriptome_alignments
## Allocation Definition
#SBATCH --account=srlab
#SBATCH --partition=srlab
## Resources
## Nodes
#SBATCH --nodes=1
## Walltime (days-hours:minutes:seconds format)
#SBATCH --time=20-00:00:00
## Memory per node
#SBATCH --mem=500G
##turn on e-mail notification
#SBATCH --mail-type=ALL
#SBATCH --mail-user=
## Specify the working directory for this job
#SBATCH --chdir=/gscratch/scrubbed/samwhite/outputs/20200616_cbai_hisat2_transcriptome_alignments

###################################################################################
# These variables need to be set by user

# Assign Variables
reads_dir=/gscratch/srlab/sam/data/C_bairdi/RNAseq
transcriptomes_dir=/gscratch/srlab/sam/data/C_bairdi/transcriptomes
threads=28

# Array of the transcriptome FastAs
transcriptomes_array=(
"${transcriptomes_dir}"/cbai_transcriptome_v1.0.fasta \
"${transcriptomes_dir}"/cbai_transcriptome_v1.5.fasta \
"${transcriptomes_dir}"/cbai_transcriptome_v1.6.fasta \
"${transcriptomes_dir}"/cbai_transcriptome_v1.7.fasta \
"${transcriptomes_dir}"/cbai_transcriptome_v2.0.fasta \
"${transcriptomes_dir}"/cbai_transcriptome_v2.1.fasta \
"${transcriptomes_dir}"/cbai_transcriptome_v3.0.fasta \
"${transcriptomes_dir}"/cbai_transcriptome_v3.1.fasta
)

###################################################################################

# Exit script if any command fails
set -e

# Load Python Mox module for Python module availability
## Hisat2 requires Python2. Fails with syntax error if using Python3
#module load intel-python3_2017
module load intel-python2_2017

# Program directories
hisat2_dir="/gscratch/srlab/programs/hisat2-2.2.0/"
samtools_dir="/gscratch/srlab/programs/samtools-1.10/samtools"

# Programs array
declare -A programs_array
programs_array=(
[hisat2]="${hisat2_dir}hisat2" \
[hisat2_build]="${hisat2_dir}hisat2-build" \
[samtools_view]="${samtools_dir} view" \
[samtools_sort]="${samtools_dir} sort" \
[samtools_index]="${samtools_dir} index"
)

# Loop through each transcriptome
for transcriptome in "${!transcriptomes_array[@]}"
do

  ## Inititalize arrays
  R1_array=()
  R2_array=()
  reads_array=()

  # Variables
  R1_list=""
  R2_list=""

  # Strip leading path from transcriptome filename
  transcriptome_name="${transcriptomes_array[$transcriptome]##*/}"

  # Capture FastA checksums for verification
  echo "Generating checksum for ${transcriptome_name}"
  md5sum "${transcriptomes_array[transcriptome]}" >> fasta.checksums.md5
  echo "Finished generating checksum for ${transcriptome_name}"
  echo ""

  if [[ "${transcriptome_name}" == "cbai_transcriptome_v1.0.fasta" ]]; then

    reads_array=("${reads_dir}"/20200[15][13][138]*megan*.fq)

    # Create array of fastq R1 files
    R1_array=("${reads_dir}"/20200[15][13][138]*megan*R1.fq)

    # Create array of fastq R2 files
    R2_array=("${reads_dir}"/20200[15][13][138]*megan*R2.fq)

  elif [[ "${transcriptome_name}" == "cbai_transcriptome_v1.5.fasta" ]]; then

    reads_array=("${reads_dir}"/20200[145][13][138]*megan*.fq)

    # Create array of fastq R1 files
    R1_array=("${reads_dir}"/20200[145][13][138]*megan*R1.fq)

    # Create array of fastq R2 files
    R2_array=("${reads_dir}"/20200[145][13][138]*megan*R2.fq)

  elif [[ "${transcriptome_name}" == "cbai_transcriptome_v1.6.fasta" ]]; then

    reads_array=("${reads_dir}"/*megan*.fq)

    # Create array of fastq R1 files
    R1_array=("${reads_dir}"/*megan*R1.fq)

    # Create array of fastq R2 files
    R2_array=("${reads_dir}"/*megan*R2.fq)

  elif [[ "${transcriptome_name}" == "cbai_transcriptome_v1.7.fasta" ]]; then

    reads_array=("${reads_dir}"/20200[145][13][189]*megan*.fq)

    # Create array of fastq R1 files
    R1_array=("${reads_dir}"/20200[145][13][189]*megan*R1.fq)

    # Create array of fastq R2 files
    R2_array=("${reads_dir}"/20200[145][13][189]*megan*R2.fq)

  elif [[ "${transcriptome_name}" == "cbai_transcriptome_v2.0.fasta" ]] \
  || [[ "${transcriptome_name}" == "cbai_transcriptome_v2.1.fasta" ]]; then

    reads_array=("${reads_dir}"/*fastp-trim*.fq)

    # Create array of fastq R1 files
    R1_array=("${reads_dir}"/*R1*fastp-trim*.fq)

    # Create array of fastq R2 files
    R2_array=("${reads_dir}"/*R2*fastp-trim*.fq)

  elif [[ "${transcriptome_name}" == "cbai_transcriptome_v3.0.fasta" ]] \
  || [[ "${transcriptome_name}" == "cbai_transcriptome_v3.1.fasta" ]]; then

    reads_array=("${reads_dir}"/*fastp-trim*20[12][09][01][24]1[48]*.fq)

    # Create array of fastq R1 files
    R1_array=("${reads_dir}"/*R1*fastp-trim*20[12][09][01][24]1[48]*.fq)

    # Create array of fastq R2 files
    R2_array=("${reads_dir}"/*R2*fastp-trim*20[12][09][01][24]1[48]*.fq)

  fi

  # Build hisat2 transcriptome index
  ${programs_array[hisat2_build]} \
  -f "${transcriptomes_array[$transcriptome]}" \
  "${transcriptome_name}" \
  -p ${threads}

  # Create list of fastq files used in analysis
  ## Uses parameter substitution to strip leading path from filename
  printf "%s\n" "${reads_array[@]##*/}" >> "${transcriptome_name}".fastq.list.txt

  # Create comma-separated lists of FastQ reads
  R1_list=$(echo "${R1_array[@]}" | tr " " ",")
  R2_list=$(echo "${R2_array[@]}" | tr " " ",")

  # Align reads to assembly
  ${programs_array[hisat2]} \
  --threads ${threads} \
  -x "${transcriptome_name}" \
  -q \
  -1 "${R1_list}" \
  -2 "${R2_list}" \
  -S "${transcriptome_name}".sam \
  2>&1 | tee "${transcriptome_name}".alignment_stats.txt

  # Convert SAM file to BAM
  ${programs_array[samtools_view]} \
  --threads ${threads} \
  -b "${transcriptome_name}".sam \
  > "${transcriptome_name}".bam

  # Sort BAM
  ${programs_array[samtools_sort]} \
  --threads ${threads} \
  "${transcriptome_name}".bam \
  -o "${transcriptome_name}".sorted.bam

  # Index for use in IGV
  ##-@ specifies thread count; --thread option not available in samtools index
  ${programs_array[samtools_index]} \
  -@ ${threads} \
  "${transcriptome_name}".sorted.bam

  # Remove original SAM and unsorted BAM
  rm "${transcriptome_name}".bam "${transcriptome_name}".sam

done

# Document programs in PATH (primarily for program version ID)
{
date
echo ""
echo "System PATH for $SLURM_JOB_ID"
echo ""
printf "%0.s-" {1..10}
echo "${PATH}" | tr : \\n
} >> system_path.log

# Capture program options
for program in "${!programs_array[@]}"
do
    {
  echo "Program options for ${program}: "
    echo ""
    ${programs_array[$program]} --help
    echo ""
    echo ""
    echo "----------------------------------------------"
    echo ""
    echo ""
  } &>> program_options.log || true
done
parkchanhee commented 4 years ago

@kubu4 This is a bug that occured while checking the read length. It will be fixed in the next version. Before the new version is released:

OR