friend1ws / nanomonsv

SV detection tool for nanopore sequence reads
GNU General Public License v3.0
78 stars 12 forks source link

ERROR - query end inconsistent when running parse #45

Open asherbryant opened 11 months ago

asherbryant commented 11 months ago

Hello!

We are analyzing COLO829 bams (fastqs from the Valle-Inclan truthset paper that were realigned with minimap2) & getting the following errors:

07/24/2023 03:17:41 - nanomonsv.parse - ERROR - query end inconsistent!! ERR2752452.116: 18078 != 5227
07/24/2023 03:17:41 - nanomonsv.parse - ERROR - query end inconsistent!! ERR2752451.4333814: 5045 != 4714

Below is the script:

#!/bin/bash

# bam files
TUMOR_BAM=/data/Lab/colo829_truthset/colo829_tumor_truthset_minimap2.bam
NORMAL_BAM=/data/Lab/colo829_truthset/colo829_normal_truthset_minimap2.bam

# reference files
HG=/data/Lab/references/human_g1k_v37.fasta

# parameters
SIZE=30
DIRECTORY=/data/Lab/bm/truthset_30bp

# source conda init file
source myconda

# nanomonsv
mkdir $DIRECTORY/nanomonsv
mamba activate nanomonsv
#-----parse-----
nanomonsv parse $TUMOR_BAM $DIRECTORY/nanomonsv/nanomonsv_tumor
nanomonsv parse $NORMAL_BAM $DIRECTORY/nanomonsv/nanomonsv_normal
#----get----
nanomonsv get $DIRECTORY/nanomonsv/nanomonsv_tumor $TUMOR_BAM \
$HG \
--control_prefix $DIRECTORY/nanomonsv/nanomonsv_normal --control_bam $NORMAL_BAM \
--use_racon \
--single_bnd \
--min_indel_size $SIZE \
--debug

Any help would be appreciated!

friend1ws commented 11 months ago

Hi, This data should be OK (because we have actively used this dataset). Could you just in case show me the header describing the alignment options bu e.g., the following command?

samtools view -H ERR2752452.bam | grep PG
asherbryant commented 11 months ago

Thank you for the reply!

Header for colo829_tumor_truthset_minimap2.bam:

@PG ID:minimap2 PN:minimap2 VN:2.26-r1175   CL:minimap2 -ax map-ont --eqx -t 30 /data/Lab/references/human_g1k_v37.fasta /data/Lab/colo829_truthset/fastq/ERR2752452.fastq.gz
@PG ID:samtools PN:samtools PP:minimap2 VN:1.17 CL:samtools sort -@4 -m 4G
@PG ID:samtools.1   PN:samtools PP:samtools VN:1.17 CL:samtools view -H colo829_tumor_truthset_minimap2.bam

I also get a similar for COLO829 made publicly available by Pacbio.

07/25/2023 17:57:23 - nanomonsv.parse - ERROR - query end inconsistent!! m84039_230312_025934_s1/227480300/ccs: 13586 != 1900
07/25/2023 17:57:24 - nanomonsv.parse - ERROR - query end inconsistent!! m84039_230327_233730_s2/157488965/ccs: 909 != 628

Script:

#!/bin/bash

# bam files
TUMOR_BAM=/data/Lab/pacbio/colo829/COLO829.GRCh38.bam
NORMAL_BAM=/data/Lab/pacbio/colo829/COLO829-BL.GRCh38.bam

# reference files
HG=/data/Lab/references/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna

# parameters
DIRECTORY=/data/Lab/severus_benchmarking/pacbio

# source conda init file
source myconda

# nanomonsv
mkdir $DIRECTORY/nanomonsv
mamba activate nanomonsv
#-----parse-----
nanomonsv parse $TUMOR_BAM $DIRECTORY/nanomonsv/nanomonsv_tumor
nanomonsv parse $NORMAL_BAM $DIRECTORY/nanomonsv/nanomonsv_normal
#----get----
nanomonsv get $DIRECTORY/nanomonsv/nanomonsv_tumor $TUMOR_BAM \
$HG \
--control_prefix $DIRECTORY/nanomonsv/nanomonsv_normal --control_bam $NORMAL_BAM \
--use_racon \
--single_bnd \
--min_indel_size $SIZE \
--debug

Header for COLO829.GRCh38.bam:

samtools view -H COLO829.GRCh38.bam | grep PG
@PG ID:ccs  PN:ccs  VN:7.0.0 (commit v7.0.0-7-g0c7f1adf)    DS:Generate circular consensus sequences (ccs) from subreads.   CL:/opt/pacbio/tag-ccs-current/bin/ccs --streamed --log-level INFO --stderr-json-log --kestrel-files-layout --movie-name m84039_230312_025934_s1 --log-file metadata/m84039_230312_025934_s1.ccs.log --min-rq 0.9 --non-hifi-prefix fail --knrt-ada --pbdc-model /opt/pacbio/tag-ccs-current/bin/../models/revio_v1.onnx --alarms metadata/m84039_230312_025934_s1.ccs.alarms.json
@PG ID:lima VN:2.7.1 (commit v2.7.1-1-gf067520) CL:/opt/pacbio/tag-lima-current/bin/lima --movie-name m84039_230312_025934_s1 --kestrel-files-layout --quality hifi --output-missing-pairs --shared-prefix --hifi-preset SYMMETRIC-ADAPTERS --store-unbarcoded --split-named --reuse-source-uuid --reuse-biosample-uuids --stderr-json-log --alarms metadata/m84039_230312_025934_s1.hifi_reads.lima.alarms.json --log-file metadata/m84039_230312_025934_s1.hifi_reads.lima.log pb_formats/m84039_230312_025934_s1.hifi_reads.consensusreadset.primrose.xml metadata/m84039_230312_025934_s1.barcodes.fasta hifi_reads/m84039_230312_025934_s1.hifi_reads.demux.bam
@PG ID:pbmm2    PN:pbmm2    VN:1.10.0 (commit v1.10.0)  CL:pbmm2 align /pbi/dept/secondary/siv/smrtlink/smrtlink-appslabht/jobs-root/cromwell-executions/sl_import_fasta/5da8287b-e356-4510-a856-81a98e63b351/call-fasta_to_reference/execution/smrtlink_reference.referenceset.xml /pbi/dept/secondary/siv/smrtlink/smrtlink-appslabht/jobs-root/cromwell-executions/pb_align_ccs/8ce15aa8-502a-4815-b4f0-438794f3692e/call-prepare_input/prepare_input/ad1ca5d2-00af-47e3-8b0c-1b29d11a5884/call-dataset_filter/execution/filtered.consensusreadset.xml mapped.consensusalignmentset.xml --sort --min-gap-comp-id-perc 70.0 --min-length 50 --sample COLO829 --preset HiFi -j 16 --log-level INFO --log-file pbmm2.log --alarms alarms.json
@PG ID:primrose VN:1.4.0 (commit v1.3.0-14-gd9f6a32)    CL:/opt/pacbio/tag-primrose-current/bin/primrose --movie-name m84039_230312_025934_s1 --kestrel-files-layout --quality hifi --reuse-source-uuid --stderr-json-log --log-file metadata/m84039_230312_025934_s1.hifi_reads.primrose.log --alarms metadata/m84039_230312_025934_s1.hifi_reads.primrose.alarms.json
@PG ID:samtools PN:samtools PP:primrose VN:1.17 CL:samtools view -H COLO829.GRCh38.bam
friend1ws commented 11 months ago

Thanks! nanomonsv cannot tread BAM files produced with the --eqx options. Is it important for other purposes? Then, I may consider supporting it.

asherbryant commented 10 months ago

Yes, It would be helpful for us if you could support --eqx option & Pacbio's official minimap2 wrapper, pbmm2. Thank you!

xwBio commented 10 months ago

Hi,

I also have a similar issue and it seems that the current version is still unable to effectively handle PacBio BAM files generated with --eqx option.

It would be appreciated If you could resolve this issue.

Best, Xiwei

friend1ws commented 10 months ago

OK. I will try to solve this problem hopefully in the next version.

xwBio commented 10 months ago

OK. I will try to solve this problem hopefully in the next version.

Thank you!