aidenlab / straw

Extract data quickly from Juicebox via straw
MIT License
61 stars 36 forks source link

Segmentation fault when calling mzd.getRecordsAsMatrix #125

Closed WangJiuming closed 1 year ago

WangJiuming commented 1 year ago

I am trying to load the contact map of an entire chromosome as an array. A segmentation fault occurred when calling the getRecordsAsMatrix() method. I originally applied it on some local data downloaded from 4DN and GEO, and later tested with an URL in the Juicebox menu (see code below). The error persisted in all cases.

Here is the code to reproduce the error.

import hicstraw

if __name__ == '__main__':
    chr_num = '1'

    hic_obj = hicstraw.HiCFile('https://hicfiles.s3.amazonaws.com/hiseq/gm12878/in-situ/combined.hic')

    mzd = hic_obj.getMatrixZoomData(chr_num, chr_num, 'observed', 'NONE', 'BP', 10000)

    chr_dict = {c.name: c for c in hic_obj.getChromosomes()}  # chr number as a str: chr object

    matrix = mzd.getRecordsAsMatrix(0, chr_dict[chr_num].length, 0, chr_dict[chr_num].length)

The exact error in this case is simply

Segmentation fault (core dumped)

For your reference, I am working with hic-straw version 1.3.1 and the getRecords() method seems to work fine.

May I ask if this indicates an error in the package or there is any alternative recommended way of efficiently loading the contact map of an entire chromosome?

Thanks in advance!

sa501428 commented 1 year ago

Does it work with chr1 instead of 1?

On Mon, Jul 3, 2023 at 7:07 AM WangJiuming @.***> wrote:

I am trying to load the contact map of an entire chromosome as an array. A segmentation fault occurred when calling the getRecordsAsMatrix() method. I originally applied it on some local data downloaded from 4DN and GEO, and later tested with an URL in the Juicebox menu (see code below). The error persisted in all cases.

Here is the code to reproduce the error.

import hicstraw

if name == 'main': chr_num = '1'

hic_obj = hicstraw.HiCFile('https://hicfiles.s3.amazonaws.com/hiseq/gm12878/in-situ/combined.hic')

mzd = hic_obj.getMatrixZoomData(chr_num, chr_num, 'observed', 'NONE', 'BP', 10000)

chr_dict = {c.name: c for c in hic_obj.getChromosomes()}  # chr number as a str: chr object

matrix = mzd.getRecordsAsMatrix(0, chr_dict[chr_num].length, 0, chr_dict[chr_num].length)

The exact error in this case is simply

Segmentation fault (core dumped)

For your reference, I am working with hic-straw version 1.3.1 and the getRecords() method seems to work fine.

May I ask if this indicates an error in the package or there is any alternative recommended way of efficiently loading the contact map of an entire chromosome?

Thanks in advance!

— Reply to this email directly, view it on GitHub https://github.com/aidenlab/straw/issues/125, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABRT23JJOCFGLUDQ3B7X3QDXOKYZ5ANCNFSM6AAAAAAZ4LTFWE . You are receiving this because you are subscribed to this thread.Message ID: @.***>

WangJiuming commented 1 year ago

Hi, thanks for your prompt help!

Unfortunately, using 'chr1' instead of '1' would give rise to a KeyError instead.

Error finding block data
Traceback (most recent call last):
  File "/home/code/test.py", line 12, in <module>
    matrix = mzd.getRecordsAsMatrix(0, chr_dict[chr_num].length, 0, chr_dict[chr_num].length)
KeyError: 'chr1'

Process finished with exit code 1
sa501428 commented 1 year ago

Oh wait, I misread the original message. Yeah it looks like you're trying to create a dense 25000x25000 matrix - that will not work well. You can use the records list to create a scipy sparse matrix if you need the entire matrix all at once.

On Mon, Jul 3, 2023 at 12:33 PM WangJiuming @.***> wrote:

Hi, thanks for your prompt help!

Unfortunately, using 'chr1' instead of '1' would give rise to a KeyError instead.

Error finding block data Traceback (most recent call last): File "/home/code/test.py", line 12, in matrix = mzd.getRecordsAsMatrix(0, chr_dict[chr_num].length, 0, chr_dict[chr_num].length) KeyError: 'chr1'

Process finished with exit code 1

— Reply to this email directly, view it on GitHub https://github.com/aidenlab/straw/issues/125#issuecomment-1618925624, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABRT23PDOIW2253C2QJU4WLXOL665ANCNFSM6AAAAAAZ4LTFWE . You are receiving this because you commented.Message ID: @.***>

WangJiuming commented 1 year ago

I see. Thanks for your help! I will close this issue now.