cbg-ethz / V-pipe

V-pipe is a pipeline designed for analysing NGS data of short viral genomes
https://cbg-ethz.github.io/V-pipe/
Apache License 2.0
130 stars 46 forks source link

`cyvcf2 = 0.30.11` leads to `ValueError: numpy.dtype size changed` #167

Open GeertvanGeest opened 3 days ago

GeertvanGeest commented 3 days ago

Describe the bug In the script enhance_bcf.py, cyvcf2 v0.30.11 is used. On a mac, this leads to the error:

Traceback (most recent call last):
  File "/vp-analysis/V-pipe/workflow/rules/../scripts/enhance_bcf.py", line 6, in <module>
    from cyvcf2 import VCF, Writer
  File "/vp-analysis/work/.snakemake/conda/1fbc2ad6a7160a5db06e7f669646c611_/lib/python3.9/site-packages/cyvcf2/__init__.py", line 1, in <module>
    from .cyvcf2 import (VCF, Variant, Writer, r_ as r_unphased, par_relatedness,
  File "cyvcf2/cyvcf2.pyx", line 1, in init cyvcf2.cyvcf2
ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

To Reproduce

curl -O 'https://raw.githubusercontent.com/cbg-ethz/V-pipe/master/utils/quick_install.sh'
bash quick_install.sh -p vp-analysis -w work

cd vp-analysis/work

# clone a specific branch of the V-pipe repository
# the two lines below can be deleted when this branch is merged into the main branch
rm -rf ../V-pipe
git clone -b gsod-rtd https://github.com/GeertvanGeest/V-pipe.git ../V-pipe

# copy the example data from the repository to your working directory
cp -r ../V-pipe/docs/example_HIV_data/* .
# check what will be run with a dry run
./vpipe -n
# run vpipe on a small HIV test dataset
# this will install all dependencies and run the pipeline
./vpipe --cores 2

Expected behavior cyvcf2 load without errors

Desktop (please complete the following information): MacOS 14.6.1 (23G93)

Possible fix

Change workflow/envs/bcftools.yaml to:

channels:
  - conda-forge
  - bioconda
dependencies:
  - bcftools = 1.13
  - cyvcf2 = 0.31.1
GeertvanGeest commented 3 days ago

Related issue: https://github.com/brentp/cyvcf2/issues/307 And this post on stackoverflow: https://stackoverflow.com/questions/78634235/numpy-dtype-size-changed-may-indicate-binary-incompatibility-expected-96-from

GeertvanGeest commented 3 days ago

Can confirm that updating to cyvcf2 = 0.31.1 fixes the issue.

gordonkoehn commented 2 days ago

Thank you, @GeertvanGeest, for reporting this in detail with a solution on hand!

May I kindly ask you to help me reproduce and understand how you came up with the dependency fix?

I've had some trouble reproducing the exact error. On my machine, I run into:

rule consensus_bcftools:
...

  File "/Users/koehng/Workspace/cyvcf_new/vp-analysis/V-pipe/workflow/rules/../scripts/enhance_bcf.py", line 6, in <module>
    from cyvcf2 import VCF, Writer
ModuleNotFoundError: No module named 'cyvcf2'

(both on arm and intel architecture)

Given your steps and with the modified workflow/envs/bcftools.yaml as you suggested. And even in the current setup with bcftools = 1.20 and cyvcf2 = 0.31.0, this appears to happen.

Thank you for sharing the GitHub thread; I follow and agree to pin cyvcf2 = 0.31.1 once I can reproduce it.

GeertvanGeest commented 4 hours ago

Whoops! indeed this seems to have been fixed 4 months ago in 5befca4. Should've kept my fork up to date ...

Note sure why you're having the issue with 'module not found'. Possibly the conda env is not loaded properly? I'll retry after merging my fork with master.

GeertvanGeest commented 3 hours ago

Can confirm that after merging with cbg-ethz/V-pipe:master this error did not occur for me. So for me it's solved.