cmmr / Marker-MAGu

Trans-Kingdom Marker Gene Pipeline for Taxonomic Profiling of Human Metagenomes
MIT License
9 stars 0 forks source link

samtools sort: failed to read header from "-" #9

Closed Ulthran closed 8 months ago

Ulthran commented 9 months ago

Hi all, I'm trying to use Marker-MAGu but running into the same error on multiple machines with multiple data sets.

Using version 0.4.0

Marker-MAGu scripts path:
/home/ctbus/Penn/sunbeam/.snakemake/6d3795d503d8186f5672988c3a1aa8db_/lib/python3.12/site-packages/markermagu
Time Update: Starting main bash mapper script for Marker-MAGu @ 02-02-24---11:37:23
Fri Feb  2 11:37:23 EST 2024
Marker-MAGu used arguments
Sample name:                  LONG_2
Read file(s):                 /home/ctbus/Penn/sunbeam/projects/double/sunbeam_output/qc/decontam/LONG_2.fastq.gz
CPUs:                         8
Output directory:             /home/ctbus/Penn/sunbeam/projects/double/sunbeam_output/virus/marker_magu
Trim for quality:             False
Remove host/spikein seqs:     False
filter sequences path:        /home/ctbus/Penn/sunbeam/.snakemake/6d3795d503d8186f5672988c3a1aa8db_/lib/python3.12/site-packages/markermagu/filter_seqs.fna
Temp directory path:          /home/ctbus/Penn/sunbeam/projects/double/sunbeam_output/virus/marker_magu/LONG_2_temp
Keep temp files:              False
Marker-MAGu script directory: /home/ctbus/Penn/sunbeam/.snakemake/6d3795d503d8186f5672988c3a1aa8db_/lib/python3.12/site-packages/markermagu
Marker-MAGu tool version:     0.4.0
Marker-MAGu database used:    /home/ctbus/Penn/marker_magu_db/v1.1/Marker-MAGu_markerDB.fna
Detection setting:            default
/home/ctbus/Penn/sunbeam/projects/double/sunbeam_output/qc/decontam/LONG_2.fastq.gz
Time Update: Concatenating input reads @ 02-02-24---11:37:23
Time Update: running seqkit stats on LONG_2 @ 02-02-24---11:37:23
Time Update: running minimap2 and samtools on LONG_2 @ 02-02-24---11:37:23
[M::mm_idx_gen::103.553*1.48] collected minimizers
[main_samview] fail to read the header from "-".
[W::hts_set_opt] Cannot change block size for this format
samtools sort: failed to read header from "-"
Time Update: running coverm on LONG_2 @ 02-02-24---11:39:13
[2024-02-02T16:39:13Z INFO  bird_tool_utils::clap_utils] CoverM version 0.7.0
[2024-02-02T16:39:13Z INFO  coverm] Using min-read-percent-identity 90%
[2024-02-02T16:39:13Z INFO  coverm] Using min-read-aligned-percent 50%
[2024-02-02T16:39:13Z INFO  coverm] Writing output to file: /home/ctbus/Penn/sunbeam/projects/double/sunbeam_output/virus/marker_magu/LONG_2_temp/LONG_2.marker-magu.unique_alignment.coverm.tsv
thread 'main' panicked at src/bam_generator.rs:504:33:
Unable to find BAM file /home/ctbus/Penn/sunbeam/projects/double/sunbeam_output/virus/marker_magu/LONG_2_temp/LONG_2.markermagu.sort.bam
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Time Update: running treshold enforcer/abundance calculator Rscript on LONG_2 @ 02-02-24---11:39:13
[1] "arguments found. Running."
Warning message:
In fread(args[1], sep = "\t", header = T, col.names = c("contig",  :
  File '/home/ctbus/Penn/sunbeam/projects/double/sunbeam_output/virus/marker_magu/LONG_2_temp/LONG_2.marker-magu.unique_alignment.coverm.tsv' has size 0. Returning a NULL data.table.
Error in eval(jsub, SDenv, parent.frame()) : object 'contig' not found
Calls: [ ... [.data.table -> eval -> eval -> tstrsplit -> transpose -> strsplit
Execution halted
Removing temp files
Main Output Files Generated in /home/ctbus/Penn/sunbeam/projects/double/sunbeam_output/virus/marker_magu/
/home/ctbus/Penn/sunbeam/projects/double/sunbeam_output/virus/marker_magu/LONG_2.MM_input.seq_stats.tsv
Time Update: Finishing Marker-MAGu LONG_2 @ 02-02-24---11:39:14
##################
##################
##################

Here's a snippet of the input fastq in case it's helpful

@NZ_CP069563.1_22022_22543_4:0:0_2:0:0_0
GCCCCCATCATCGGAGACATCCCCCTGGCCGACGAGGGTAACGTGAAAGCGGGAGGCATCGCTGTCCGCG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NZ_CP069563.1_60812_61233_1:0:0_1:0:0_1
AATCATCACGGCCCTTACGTCCGGGGCTACACACGTGTTACAATGGGGGGTACAGAAGGCAGCTAGCGGG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NZ_CP069563.1_62297_62816_2:0:0_2:0:0_2
GAATGTCTGCTTCCAAGCCAACATCCTCGCTGTCTTAGCAATCTGACTTCGTTAGTTCAACTTAGTGTTT
+
2222222222222222222222222222222222222222222222222222222222222222222222

These are simulated reads which is why the quality scores are so consistent. It's also failing with a real data set.

Thanks in advance for any help, Charlie

mtisza1 commented 9 months ago

Hi Charlie,

Thanks for opening this issue. Sorry you had this problem and that the error message is not more straightforward.

The first thing I'm thinking of is that your computer or compute node doesn't have enough memory, which caused minimap2 to crash. Since the marker-MAGu database is so large, it usually uses up to 66 GB of memory on our HPC.

Let me know if this seems likely. If not, I'll look into it more next week.

Mike

mtisza1 commented 9 months ago

I'm now realizing your input file is gzipped (.gz). This is likely the issue. Please try again with your read file decompressed. In the next update I will add a check at the beginning of the program to check that the input reads are in fastq format.

Ulthran commented 8 months ago

Thanks for your quick reply and sorry for my slow one. Unzipping the inputs and giving 70GB of memory has it working now!