38 / d4-format

The D4 Quantitative Data Format
MIT License
159 stars 20 forks source link

Bug: OSError: memory map must have a non-zero length in load_to_np_impl #70

Open mrvollger opened 1 year ago

mrvollger commented 1 year ago

Hi @38 and @arq5x,

I am getting an error when trying to open a d4 matrix in pyd4:

OSError: memory map must have a non-zero length

I have tried remaking the input file a few times but I keep getting this error. Interestingly if I use the command line tool d4tools I get no error accessing the same region. I have also used the python code successfully on three other samples but it is failing here, so I am at a bit of a loss.

I include details and inputs below, thanks in advance!

Details: Here is a full traceback of the error

python test.d4.py 
Traceback (most recent call last):
  File "/mmfs1/gscratch/stergachislab/mvollger/projects/GM12878_aCRE_2022-08-16/test.d4.py", line 14, in <module>
    matrix["chr1", 0, 1000]
  File "/mmfs1/gscratch/stergachislab/mvollger/miniconda3/envs/fiberseq-smk/lib/python3.9/site-packages/pyd4/__init__.py", line 100, in __getitem__
    data = [track[key] for track in self.tracks]
  File "/mmfs1/gscratch/stergachislab/mvollger/miniconda3/envs/fiberseq-smk/lib/python3.9/site-packages/pyd4/__init__.py", line 100, in <listcomp>
    data = [track[key] for track in self.tracks]
  File "/mmfs1/gscratch/stergachislab/mvollger/miniconda3/envs/fiberseq-smk/lib/python3.9/site-packages/pyd4/__init__.py", line 430, in __getitem__
    return self.load_to_np(key)
  File "/mmfs1/gscratch/stergachislab/mvollger/miniconda3/envs/fiberseq-smk/lib/python3.9/site-packages/pyd4/__init__.py", line 513, in load_to_np
    return self._for_each_region(regions, load_to_np_impl)
  File "/mmfs1/gscratch/stergachislab/mvollger/miniconda3/envs/fiberseq-smk/lib/python3.9/site-packages/pyd4/__init__.py", line 454, in _for_each_region
    ret.append(func(name, begin, end))
  File "/mmfs1/gscratch/stergachislab/mvollger/miniconda3/envs/fiberseq-smk/lib/python3.9/site-packages/pyd4/__init__.py", line 507, in load_to_np_impl
    self.load_values_to_buffer(name, begin, end, buf_addr)
OSError: memory map must have a non-zero length

But when I access the same region with d4tools it works fine:

$ d4tools view results/Phased_GM12878_pat/fdr.coverages.d4 chr1:0-100000 | head
chr1    0       10000   0       0       0       0       0       0       0       0       0       0       0       0
chr1    10000   10001   0       0       0       1       0       0       0       0       0       1       5       13
chr1    10001   10003   0       0       0       1       0       0       0       0       0       1       5       14
chr1    10003   10009   0       0       0       1       0       0       0       0       0       1       5       16
chr1    10009   10012   0       0       0       1       0       0       0       0       0       1       4       17
chr1    10012   10014   0       0       0       1       0       0       0       0       0       0       4       18
chr1    10014   10031   0       0       0       1       0       0       0       0       0       0       3       19
chr1    10031   10032   0       0       0       1       0       0       0       0       0       0       4       18
chr1    10032   10033   0       0       0       1       0       0       0       0       0       0       3       19
chr1    10033   10043   0       0       0       1       0       0       0       0       0       0       2       20

Here is a link to the file: https://eichlerlab.gs.washington.edu/help/mvollger/tracks/fiberseq/fdr.coverages.d4 and here is the python code I have that gives the error:

import pyd4
import sys
import logging
import os

in_d4 ="./results/Phased_GM12878_pat/fdr.coverages.d4"
logging.info(f"Reading in d4 file: {in_d4}")
file = pyd4.D4File(in_d4)
logging.info(f"Opened d4 file: {in_d4}")
chroms = file.chroms()
matrix = file.open_all_tracks()
track_names = matrix.track_names
logging.info(f"Trying to open d4 matrix")
matrix["chr1", 0, 100000]
38 commented 1 year ago

Thanks for reporting the issue, it seems this is a bug related to the mapped IO interface. The reason why d4tools view doesn't have this issue is because d4tools view uses the streamed IO. I've committed a potential fix to the repo, please let me know if the latest commit solved your issue.

Thanks! Hao