MBHallgren / MINTyper

6 stars 0 forks source link

Output not complete when executed in bash script or snakemake workflow #7

Open alexahess opened 1 year ago

alexahess commented 1 year ago

Hello,

I'm trying to run MINTyper as part of a snakemake workflow. However, when I run it as part of a snakemake workflow or bash script, it seems like MINTyper exits without completely writing the output.

An example is the tree.newick file. As part of a snakemake workflow or bash script the output looks like this:

(((((barcode06_filtered.fsa:0.000000000,barcode08_filtered.fsa:0.000000000):0.000000000,barcode02_filtered.fsa:2.000000000):7567.833333333,barcode03_filtered.fsa:7640.166666667):1220.100000000,(barcode10_filtered.fsa:43.571428571,barcode07_filtered.fsa:58.428571429):5257.400000000;

as you can see the newick file is incomplete.

When I run mintyper from the command line the output is a expected:

(((((barcode06_filtered.fsa:0.000000000,barcode08_filtered.fsa:0.000000000):0.000000000,barcode02_filtered.fsa:3.000000000):10278.600000000,barcode03_filtered.fsa:10347.400000000):1265.750000000,(barcode10_filtered.fsa:59.375000000,barcode07_filtered.fsa:74.625000000):8247.250000000):10645.781250000,barcode09_filtered.fsa:7196.468750000,((barcode12_filtered.fsa:5867.875000000,barcode01_filtered.fsa:5807.125000000):1321.156250000,barcode05_filtered.fsa:7234.343750000):131.781250000);

Here is an example of a script that creates the incomplete newick file, however the issue is the same when running it in snakemake:

#!/bin/bash

# Assign the first and second arguments to variables
group=$1
refSeq=$2

# Create the output folders if they do not exist
echo "Creating output folders..."
mkdir -p results/$group/trimmed
mkdir -p results/$group/mintyper

# Loop over the files in the raw directory that match the pattern barcode*.fastq.gz
echo "Looping over input files and trimming"
for input in data/raw/$group/barcode*.fastq.gz; do
  # Extract the barcode number from the input file name
  barcode=${input##*/barcode}
  barcode=${barcode%.fastq.gz}

  # Define the output file name using the barcode number
  output=results/$group/trimmed/barcode${barcode}_filtered.fastq.gz

  # Run chopper using the input and output file names
  echo "Trimming $input..."
  gunzip -c $input | chopper -q 10 --minlength 500 -t 32 | gzip > $output & # Add & to run in background
  pid=$! # Store the process ID of gunzip
  wait $pid # Wait for gunzip to finish
done

# Run mintyper using all the output files in the trimmed folder and the reference sequence
echo "Running mintyper"
mintyper --nanopore results/$group/trimmed/* --ref data/references/$refSeq.fasta --output results/$group/mintyper & # Add & to run in background
pid=$! # Store the process ID of mintyper
wait $pid # Wait for mintyper to finish

echo "Done."

It seems like mintyper terminates before the output is correctly written. Do you know why this occurs and if there is a workaround?