PacificBiosciences / sawfish

Structural variant discovery and genotyping from mapped PacBio HiFi data
Other
22 stars 1 forks source link

A problem of version 0.12.2 #2

Closed zengxi-hada closed 1 week ago

zengxi-hada commented 3 weeks ago

I am running sawfish-v0.12.2 on this bam file https://s3-us-west-2.amazonaws.com/human-pangenomics/working/HPRC_PLUS/HG002/analysis/aligned_reads/hifi/GRCh38/HG002_aligned_GRCh38_winnowmap.sorted.bam using the command line: sawfish discover --threads 16 --ref Homo_sapiens_assembly38.fasta --bam HG002_aligned_GRCh38_winnowmap.sorted.bam --output-dir HG002_discover_dir

The following error happens: thread '<unnamed>' panicked at src/refine_sv/trimmed_reads.rs:197:5: assertion failed: !is_hard_clipped(&record.cigar()) Below is the full running log:

[2024-08-25][09:31:28][sawfish][INFO] Starting sawfish
[2024-08-25][09:31:28][sawfish][INFO] cmdline: /home/xiz978/bin/BCH/sawfish-v0.12.2-x86_64-unknown-linux-gnu/bin/sawfish discover --threads 16 --ref /n/data1/bch/genetics/lee/reference/hg38/Homo_sapiens_assembly38.fasta --bam /n/storage_test/xiz978/Wangziying/HG002/hifi/HG002_aligned_GRCh38_winnowmap.sorted.bam --output-dir HG002_discover_dir
[2024-08-25][09:31:28][sawfish][INFO] Running on 16 threads
[2024-08-25][09:31:29][sawfish][INFO] Writing discover settings to file: 'HG002_discover_dir/discover.settings.json'
[2024-08-25][09:31:29][sawfish][INFO] Reading reference genome from file '/n/data1/bch/genetics/lee/reference/hg38/Homo_sapiens_assembly38.fasta'
[2024-08-25][09:31:58][sawfish][INFO] Processing alignment file '/n/storage_test/xiz978/Wangziying/HG002/hifi/HG002_aligned_GRCh38_winnowmap.sorted.bam'
[2024-08-25][09:36:58][sawfish][INFO] Processed alignments on 1,181,630 of 3,099,922 ref genome kb (38%)
[2024-08-25][09:42:05][sawfish][INFO] Processed alignments on 2,595,202 of 3,099,922 ref genome kb (83%)
[2024-08-25][09:43:24][sawfish][INFO] Finished processing all alignments
[2024-08-25][09:43:24][sawfish][INFO] Writing max sv depth genome regions to bed file: 'HG002_discover_dir/max.depth.bed'
[2024-08-25][09:43:24][sawfish][INFO] Getting depth bin GC content
[2024-08-25][09:43:27][sawfish][INFO] Getting GC depth correction levels
[2024-08-25][09:43:27][sawfish][INFO] Completing breakpoint observation clustering
[2024-08-25][09:43:28][sawfish][INFO] Finished breakpoint observation clustering
[2024-08-25][09:43:28][sawfish][INFO] Consolidating adjacent indel candidates
[2024-08-25][09:43:28][sawfish][INFO] Finding large insertion candidates
[2024-08-25][09:43:28][sawfish][INFO] Writing breakpoint_cluster debug bed file: 'HG002_discover_dir/debug.breakpoint_clusters.bed'
[2024-08-25][09:43:28][sawfish][INFO] Refining SV Candidates
[2024-08-25][09:43:28][sawfish][INFO] Writing debug assembly region bed file: 'HG002_discover_dir/assembly.regions.bed'
[2024-08-25][09:43:28][sawfish][INFO] Starting refinement of breakpoint evidence clusters
thread '<unnamed>' panicked at src/refine_sv/trimmed_reads.rs:197:5:
assertion failed: !is_hard_clipped(&record.cigar())
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
/var/spool/slurmd/job45032830/slurm_script: line 9: 17052 Aborted                 /home/xiz978/bin/BCH/sawfish-v0.12.2-x86_64-unknown-linux-gnu/bin/sawfish discover --threads 16 --ref /n/data1/bch/genetics/lee/reference/hg38/Homo_sapiens_assembly38.fasta --bam /n/storage_test/xiz978/Wangziying/HG002/hifi/HG002_aligned_GRCh38_winnowmap.sorted.bam --output-dir HG002_discover_dir
[2024-08-25][09:43:41][sawfish][INFO] Starting sawfish
[2024-08-25][09:43:41][sawfish][INFO] cmdline: /home/xiz978/bin/BCH/sawfish-v0.12.2-x86_64-unknown-linux-gnu/bin/sawfish joint-call --threads 16 --sample HG002_discover_dir --output-dir HG002_joint_call_dir
[2024-08-25][09:43:41][sawfish][INFO] Running on 16 threads
[2024-08-25][09:43:41][sawfish][INFO] Reading reference genome from file '/n/data1/bch/genetics/lee/reference/hg38/Homo_sapiens_assembly38.fasta'
[2024-08-25][09:44:10][sawfish][INFO] Reading all sample discovery input
thread '<unnamed>' panicked at /var/lib/bamboo/.cargo/registry/src/index.crates.io-6f17d22bba15001f/unwrap-1.2.1/src/lib.rs:55:25:

I am wondering whether this version can be upgraded to bypass this issue?

Thank you.

ctsa commented 2 weeks ago

Thanks, I'll refer to the answer for this error on the previous issue:

https://github.com/PacificBiosciences/sawfish/issues/1#issuecomment-2310503108

...I will leave this issue open to track an improved error message update for you.

jamesc99 commented 2 weeks ago

Hi,

after running Sawfish v0.12.2, the previous error in #1 has been fixed. However, I got another error regarding 'discover' step.

[2024-08-25][11:24:45][sawfish][INFO] Starting sawfish
[2024-08-25][11:24:45][sawfish][INFO] cmdline: sawfish discover --bam /users/u250191/ryan_scratch_ln/benchmark_inv/rawdata/hg002/pacbio/minimap2/sorted_PacBio_HG002_m54238_180916_191625.Q20.fastq.bam --ref /users/u250191/ryan_scratch_ln/reference/human-grch38.fasta
[2024-08-25][11:24:45][sawfish][INFO] Running on 8 threads
[2024-08-25][11:24:45][sawfish][INFO] Writing discover settings to file: 'sawfish_discover_output/discover.settings.json'
[2024-08-25][11:24:45][sawfish][INFO] Reading reference genome from file '/users/u250191/ryan_scratch_ln/reference/human-grch38.fasta'
[2024-08-25][11:25:21][sawfish][INFO] Processing alignment file '/users/u250191/ryan_scratch_ln/benchmark_inv/rawdata/hg002/pacbio/minimap2/sorted_PacBio_HG002_m54238_180916_191625.Q20.fastq.bam'
[2024-08-25][11:30:21][sawfish][INFO] Processed alignments on 3,037,641 of 3,088,286 ref genome kb (98%)
[2024-08-25][11:30:25][sawfish][INFO] Finished processing all alignments
[2024-08-25][11:30:25][sawfish][INFO] Writing max sv depth genome regions to bed file: 'sawfish_discover_output/max.depth.bed'
[2024-08-25][11:30:25][sawfish][INFO] Getting depth bin GC content
[2024-08-25][11:30:29][sawfish][INFO] Getting GC depth correction levels
[2024-08-25][11:30:29][sawfish][INFO] Completing breakpoint observation clustering
[2024-08-25][11:30:30][sawfish][INFO] Finished breakpoint observation clustering
[2024-08-25][11:30:30][sawfish][INFO] Consolidating adjacent indel candidates
[2024-08-25][11:30:30][sawfish][INFO] Finding large insertion candidates
[2024-08-25][11:30:30][sawfish][INFO] Writing breakpoint_cluster debug bed file: 'sawfish_discover_output/debug.breakpoint_clusters.bed'
[2024-08-25][11:30:31][sawfish][INFO] Refining SV Candidates
[2024-08-25][11:30:31][sawfish][INFO] Writing debug assembly region bed file: 'sawfish_discover_output/assembly.regions.bed'
[2024-08-25][11:30:31][sawfish][INFO] Starting refinement of breakpoint evidence clusters 
/var/spool/slurm/d/job540156/slurm_script: line 34: 2827442 Illegal instruction     (core dumped) sawfish discover --bam $input_bam --ref $ref38

all bam files were aligned by minimap2 with code $minimap -Y -ax map-hifi -R "@RG\tSM:${INFOR}\tID:${RG_ID}" $ref38 $FASTQ_FILE --MD -t 8. Some of them worked while some of them failed due to error above.

my script to run sawfish is attached below:

#!/bin/bash
#SBATCH --job-name=sawfish
#SBATCH --output=%x_%A_%a.out
#SBATCH --error=%x_%A_%a.err
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=64gb
#SBATCH --time=72:00:00
#SBATCH --partition=medium
#SBATCH -A proj-fs0002

# Load necessary modules or source conda
source /users/u250191/.bashrc
mamba activate sawfish

# Set reference and work directory
ref38="/users/u250191/ryan_scratch_ln/reference/human-grch38.fasta"
WORK_DIR=$(pwd)

# Set input parameters
input_bam=$1

# Ensure all parameters are provided
if [ -z "$input_bam" ] ; then
    echo "Usage: sbatch script.sh <input_bam>"
    exit 1
fi

# Step 1: sawfish discover
sawfish discover --bam $input_bam --ref $ref38

# Step 2: sawfish joint-call
sawfish joint-call --sample sawfish_discover_output

I initially thought this was due to a memory issue and I increased it to 144 GB on a 30x coverage BAM file, and sawfish failed again. Because you mentioned the mem per core should be less than 8 gb, in which case I may need instruction from your side.

Thanks again for your time, Best!

ctsa commented 2 weeks ago

Copied this report into a new issue #3

ctsa commented 1 week ago

Related to the original issue on this thread, the v0.12.3 release fixes the error to a (non-panic) message describing that hard-clipped input is not currently accepted.