NBISweden / Earth-Biogenome-Project-pilot

Assembly and Annotation workflows for analysing data in the Earth Biogenome Project pilot project.
https://www.earthbiogenome.org/
GNU General Public License v3.0
10 stars 8 forks source link

New Module: KAT_COMP #1

Closed mahesh-panchal closed 2 years ago

mahesh-panchal commented 2 years ago

Add a new module KAT_COMP to generate a histogram of the k-mer spectra.

Usage (old script):

module load bioinfo-tools KAT

CPUS="${SLURM_NPROCS:-8}"
JOB=$SLURM_ARRAY_TASK_ID

SAMPLE_PREFIX=SampleA_trimmed_no_human_normalised
DATA_DIR=/path/to/reads
FASTA_DIR=/path/to/assemblies
FILES=( $FASTA_DIR/*.fasta )

apply_katcomp () {
    ASSEMBLY="$1"       # The assembly is the first parameter to this function
    READ1="$2"      # The first read pair is the second parameter to this function
    READ2="$3"      # The second read pair is the third parameter to this function
    PREFIX=$( basename "${ASSEMBLY}" .fasta)
    TMP_FASTQ=$(mktemp -u --suffix ".fastq")
    mkfifo "${TMP_FASTQ}" && zcat "$READ1" "$READ2" > "${TMP_FASTQ}" &      # Make a named pipe and combine reads
    sleep 5                                                                             # Give a little time for the pipe to be made
    kat comp -H 800000000 -t "$CPUS" -o "${PREFIX}_vs_reads.cmp" "${TMP_FASTQ}" "$ASSEMBLY"     # Compare Reads to Assembly
    rm "${TMP_FASTQ}"
}

FASTA="${FILES[$JOB]}"
apply_katcomp "$FASTA" "$DATA_DIR/${SAMPLE_PREFIX}_R"{1,2}.fastq.gz