cytham / nanovar

Structural variant caller for low-depth long-read sequencing data
GNU General Public License v3.0
45 stars 10 forks source link

ValueError: file has no sequences defined (mode='rb') - is it SAM/BAM format? Consider opening with check_sq=False #90

Open lovelycatZ opened 4 months ago

lovelycatZ commented 4 months ago

Hello, cytham

Thanks for providing such a nice tool. I encounted a error when I test a small sample with Nanovar 1.7.0. My commadline is "nanovar -x pacbio-ccs -f hg38 demo.sort.bam ~/ref_genome/hg38/hg38.fa ./"

[27/07/2024 18:03:55] - NanoVar started [27/07/2024 18:03:55] - Checking integrity of input files - Pass [27/07/2024 18:04:15] - Analyzing read alignments and detecting SVs - Done [27/07/2024 18:04:19] - Clustering SV breakends - Done [27/07/2024 18:04:19] - Correcting DUP and detecting TE - Traceback (most recent call last): File "/gpfs/hpc/home/lijc/xiangxud/software/miniconda3/envs/env_TGS_2/bin/nanovar", line 635, in main() File "/gpfs/hpc/home/lijc/xiangxud/software/miniconda3/envs/env_TGS_2/bin/nanovar", line 445, in main run.dup_te_detect(ref_dir, threads, mm, st, data_type) # File "/gpfs/hpc/home/lijc/xiangxud/software/miniconda3/envs/env_TGS_2/lib/python3.8/site-packages/nanovar/nv_characterize.py", line 201, in dup_te_detect self.index2te, self.out_nn = dup_te_analyzer(self.dir, self.out_nn, self.total_out, self.thres, ref_dir, self.refpath, mm, threads, data_type, st, self.debug) File "/gpfs/hpc/home/lijc/xiangxud/software/miniconda3/envs/env_TGS_2/lib/python3.8/site-packages/nanovar/nv_dup_te_detect.py", line 37, in dup_te_analyzer index2te = parse_bam_te(bam_te, wk_dir, read2index) File "/gpfs/hpc/home/lijc/xiangxud/software/miniconda3/envs/env_TGS_2/lib/python3.8/site-packages/nanovar/nv_dup_te_detect.py", line 121, in parse_bam_te sam = pysam.AlignmentFile(bam, "rb") File "pysam/libcalignmentfile.pyx", line 751, in pysam.libcalignmentfile.AlignmentFile.cinit File "pysam/libcalignmentfile.pyx", line 1000, in pysam.libcalignmentfile.AlignmentFile._open ValueError: file has no sequences defined (mode='rb') - is it SAM/BAM format? Consider opening with check_sq=False

And there is also another error in the log file: [27/07/2024 18:00:29] - INFO - Initialize NanoVar log file [27/07/2024 18:00:29] - INFO - Version: NanoVar-1.7.0 [27/07/2024 18:00:29] - INFO - Command: /gpfs/hpc/home/lijc/xiangxud/software/miniconda3/envs/env_TGS_2/bin/nanovar -x pacbio-ccs HB0016.sort.bam /gpfs/hpc/home/lijc/xiangxud/ref_genome/hg38/hg38.fa ./ [27/07/2024 18:00:29] - INFO - Input file: HB0016.sort.bam [27/07/2024 18:00:29] - INFO - Read type: pacbio-ccs [27/07/2024 18:00:29] - INFO - Reference genome: /gpfs/hpc/home/lijc/xiangxud/ref_genome/hg38/hg38.fa [27/07/2024 18:00:29] - INFO - Working directory: ./ [27/07/2024 18:00:29] - INFO - Model: /gpfs/hpc/home/lijc/xiangxud/software/miniconda3/envs/env_TGS_2/lib/python3.8/site-packages/nanovar/model/ANN.E100B400L3N12-5D0.4-0.2SGDsee11_het_ccs_v1.h5 [27/07/2024 18:00:29] - INFO - Filter file: None [27/07/2024 18:00:29] - INFO - Minimum number of reads for calling a breakend: 2 [27/07/2024 18:00:29] - INFO - Minimum SV len: 25 [27/07/2024 18:00:29] - INFO - Mapping percent for split-read: 0.05 [27/07/2024 18:00:29] - INFO - Length buffer for clustering: 50 [27/07/2024 18:00:29] - INFO - Score threshold: 1.0 [27/07/2024 18:00:29] - INFO - Homozygous read ratio threshold: 0.75 [27/07/2024 18:00:29] - INFO - Heterozygous read ratio threshold: 0.35 [27/07/2024 18:00:29] - INFO - Number of threads: 1

[27/07/2024 18:00:29] - INFO - Total number of reads in FASTQ/FASTA: -

[27/07/2024 18:00:29] - INFO - NanoVar started [27/07/2024 18:00:48] - INFO - Input BAM/CRAM file, skipping minimap2 alignment [27/07/2024 18:00:51] - DEBUG - Falling back to TensorFlow client; we recommended you install the Cloud TPU client directly with pip install cloud-tpu-client. [27/07/2024 18:00:54] - INFO - Parsing BAM and detecting SVs [27/07/2024 18:00:54] - INFO - Gap dictionary not loaded. [27/07/2024 18:00:54] - INFO - Genome size: 3209286105 bases [27/07/2024 18:00:54] - INFO - Mapped bases: 3652306 bases [27/07/2024 18:00:54] - INFO - Depth of coverage: 1.0x [27/07/2024 18:00:54] - WARNING - Sequencing depth is less than 4x, output may not be comprehensive [27/07/2024 18:00:54] - INFO - Read overlap upper limit: 10

[27/07/2024 18:00:54] - INFO - Total number of mapped reads: 239

[27/07/2024 18:00:54] - INFO - Clustering SV breakends [27/07/2024 18:00:54] - INFO - Neural network inference [27/07/2024 18:00:54] - DEBUG - Creating converter from 3 to 5 [27/07/2024 18:00:54] - INFO - Detecting DUP and TE [27/07/2024 18:00:54] - DEBUG - [ERROR] unknown preset 'map-hifi'

please help me, looking forward to your soon reply!

cytham commented 4 months ago

Hi @lovelycatZ, may I know what is your minimap2 version?

minimap2 --version

lovelycatZ commented 4 months ago

It's 2.28-r1209. I installed minimap2 by conda.

lovelycatZ commented 4 months ago

And I used the "-x map-hifi" parameter in minimap2 alignment.

cytham commented 4 months ago

This is strange, it works on my end. Did you run NanoVar in a conda environment? If so, can you please confirm that the minimap2 version in that environment is 2.28-r1209? Thanks

lovelycatZ commented 4 months ago

I have confirmed. The minimap2 version which I used to align was 2.28-r1209, but in the conda env of NanoVar it was 2.15-r905, was that the reason for the problem?

cytham commented 4 months ago

Thanks for checking @lovelycatZ. Yes, I think that is the problem. Can you please update the minimap2 in conda env of NanoVar to 2.28-r1209 also? thanks

lovelycatZ commented 4 months ago

Hi, Cytham

When I align my sample using minimap2 v2.15-r905, another error prompted out:

nanovar -t 3 HB0016.sort.bam ~/ref_genome/hg38/hg38.fa ./

[27/07/2024 23:33:12] - NanoVar started
[27/07/2024 23:33:12] - Checking integrity of input files - Pass
[27/07/2024 23:33:33] - Analyzing read alignments and detecting SVs - Done
[27/07/2024 23:33:39] - Clustering SV breakends - Done
[27/07/2024 23:33:39] - Correcting DUP and detecting TE - Done
[27/07/2024 23:34:39] - Generating VCF files and report - Warning: the index file is older than the FASTA file.
Traceback (most recent call last):
  File "/gpfs/hpc/home/lijc/xiangxud/software/miniconda3/envs/TGS_env_2/bin/nanovar", line 635, in <module>
    main()
  File "/gpfs/hpc/home/lijc/xiangxud/software/miniconda3/envs/TGS_env_2/bin/nanovar", line 486, in main
    run.vcf_report() #
  File "/gpfs/hpc/home/lijc/xiangxud/software/miniconda3/envs/TGS_env_2/lib/python3.8/site-packages/nanovar/nv_characterize.py", line 209, in vcf_report
    create_report(self.dir, self.contig, self.thres, self.rpath, self.refpath, self.rlendict, self.rname,
  File "/gpfs/hpc/home/lijc/xiangxud/software/miniconda3/envs/TGS_env_2/lib/python3.8/site-packages/nanovar/nv_report.py", line 140, in create_report
    sv_len_dict(fwd, totalsvlen, tab)
  File "/gpfs/hpc/home/lijc/xiangxud/software/miniconda3/envs/TGS_env_2/lib/python3.8/site-packages/nanovar/nv_report.py", line 222, in sv_len_dict
    plt.savefig(os.path.join(fwd, 'sv_lengths.png'), bbox_inches='tight', dpi=100, facecolor=fig.get_facecolor(),
  File "/gpfs/hpc/home/lijc/xiangxud/software/miniconda3/envs/TGS_env_2/lib/python3.8/site-packages/matplotlib/pyplot.py", line 1023, in savefig
    res = fig.savefig(*args, **kwargs)
  File "/gpfs/hpc/home/lijc/xiangxud/software/miniconda3/envs/TGS_env_2/lib/python3.8/site-packages/matplotlib/figure.py", line 3378, in savefig
    self.canvas.print_figure(fname, **kwargs)
  File "/gpfs/hpc/home/lijc/xiangxud/software/miniconda3/envs/TGS_env_2/lib/python3.8/site-packages/matplotlib/backend_bases.py", line 2342, in print_figure
    self.figure.draw(renderer)
  File "/gpfs/hpc/home/lijc/xiangxud/software/miniconda3/envs/TGS_env_2/lib/python3.8/site-packages/matplotlib/artist.py", line 95, in draw_wrapper
    result = draw(artist, renderer, *args, **kwargs)
  File "/gpfs/hpc/home/lijc/xiangxud/software/miniconda3/envs/TGS_env_2/lib/python3.8/site-packages/matplotlib/artist.py", line 72, in draw_wrapper
    return draw(artist, renderer)
  File "/gpfs/hpc/home/lijc/xiangxud/software/miniconda3/envs/TGS_env_2/lib/python3.8/site-packages/matplotlib/figure.py", line 3175, in draw
    mimage._draw_list_compositing_images(
  File "/gpfs/hpc/home/lijc/xiangxud/software/miniconda3/envs/TGS_env_2/lib/python3.8/site-packages/matplotlib/image.py", line 131, in _draw_list_compositing_images
    a.draw(renderer)
  File "/gpfs/hpc/home/lijc/xiangxud/software/miniconda3/envs/TGS_env_2/lib/python3.8/site-packages/matplotlib/artist.py", line 72, in draw_wrapper
    return draw(artist, renderer)
  File "/gpfs/hpc/home/lijc/xiangxud/software/miniconda3/envs/TGS_env_2/lib/python3.8/site-packages/matplotlib/axes/_base.py", line 3064, in draw
    mimage._draw_list_compositing_images(
  File "/gpfs/hpc/home/lijc/xiangxud/software/miniconda3/envs/TGS_env_2/lib/python3.8/site-packages/matplotlib/image.py", line 131, in _draw_list_compositing_images
    a.draw(renderer)
  File "/gpfs/hpc/home/lijc/xiangxud/software/miniconda3/envs/TGS_env_2/lib/python3.8/site-packages/matplotlib/artist.py", line 72, in draw_wrapper
    return draw(artist, renderer)
  File "/gpfs/hpc/home/lijc/xiangxud/software/miniconda3/envs/TGS_env_2/lib/python3.8/site-packages/matplotlib/axis.py", line 1388, in draw
    ticks_to_draw = self._update_ticks()
  File "/gpfs/hpc/home/lijc/xiangxud/software/miniconda3/envs/TGS_env_2/lib/python3.8/site-packages/matplotlib/axis.py", line 1282, in _update_ticks
    minor_locs = self.get_minorticklocs()
  File "/gpfs/hpc/home/lijc/xiangxud/software/miniconda3/envs/TGS_env_2/lib/python3.8/site-packages/matplotlib/axis.py", line 1501, in get_minorticklocs
    minor_locs = np.asarray(self.minor.locator())
  File "/gpfs/hpc/home/lijc/xiangxud/software/miniconda3/envs/TGS_env_2/lib/python3.8/site-packages/matplotlib/ticker.py", line 2341, in __call__
    return self.tick_values(vmin, vmax)
  File "/gpfs/hpc/home/lijc/xiangxud/software/miniconda3/envs/TGS_env_2/lib/python3.8/site-packages/matplotlib/ticker.py", line 2358, in tick_values
    raise ValueError(
ValueError: Data has no positive values, and therefore can not be log-scaled.

And I have found in the minimap2 v2.15-r905, the choices of parameter -x were little different from minimap2 v2.28-r1209 for reference mapping:

for minimap2 v2.15-r905:

Preset:
                 - map-pb/map-ont: PacBio/Nanopore vs reference mapping

and for minimap2 2.28-r1209:

Preset:
                 - map-pb/map-hifi/map-ont/map-iclr - CLR/HiFi/Nanopore/ICLR vs reference mapping

Does NanoVar require a specific version number of minimap2?

cytham commented 4 months ago

I think you are getting this error because 0 SVs were called. I have fixed this issue in the v1.7.1-dev branch. You can try it by cloning this branch:

git clone -b v1.7.1-dev https://github.com/cytham/nanovar.git

And yes, NanoVar requires minimap2 >= 2.19. Sorry that this is unclear.

lovelycatZ commented 4 months ago

Thanks for your sending me these important infos. It would be nice if the repaired version of NanoVar could be accessed by Pypi or Conda. The latest version is v1.5.1 for Conda and 1.7.0 for Pipy.发自我的手机-------- 原始邮件 --------发件人: Cheng Yong Tham @.>日期: 2024年7月28日周日 00:28收件人: cytham/nanovar @.>抄送: Red @.>, Mention @.>主 题: Re: [cytham/nanovar] ValueError: file has no sequences defined (mode='rb') - is it SAM/BAM format? Consider opening with check_sq=False (Issue #90) I think you are getting this error because 0 SVs were called. I have fixed this issue in the v1.7.1-dev branch. You can try it by cloning this branch: git clone -b v1.7.1-dev https://github.com/cytham/nanovar.git And yes, NanoVar requires minimap2 >= 2.19. Sorry that this is unclear.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

cytham commented 1 month ago

@lovelycatZ NanoVar v1.8.1 is now available from PyPI and Conda, thanks for your understanding