aidenlab / straw

Extract data quickly from Juicebox via straw
MIT License
62 stars 36 forks source link

Issue loading .hic file from 4DN Data Portal #51

Closed srcoulombe closed 4 years ago

srcoulombe commented 4 years ago

Describe the bug The current python version of straw (and strawC) isn't able to load a file downloaded from the $DN Data Portal. Running straw/python/read_hic_header.py with the .hic file suggests that this file should be readable though.

To Reproduce Steps to reproduce the behavior:

  1. Go to https://data.4dnucleome.org/files-processed/4DNFI4OUMWZ8/

  2. Click on download 4DNFI4OUMWZ8.hic

  3. Run straw/python/read_hic_header.py <path/to/4DNFI4OUMWZ8.hic>, should see:

    HiC version : 8 Master index : 17049250642 Genome ID : /var/lib/cwl/stga58048be-68cd-4724-b145-c172fb1dd45b/4DNFI3UBJ3HZ.chrom.sizes Chromosomes : {'ALL': 2725521, '1': 195471971, '2': 182113224, '3': 160039680, '4': 156508116, '5': 151834684, '6': 149736546, '7': 145441459, '8': 129401213, '9': 124595110, '10': 130694993, '11': 122082543, '12': 120129022, '13': 120421639, '14': 124902244, '15': 104043685, '16': 98207768, '17': 94987271, '18': 90702639, '19': 61431566, 'X': 171031299, 'Y': 91744698} Base pair-delimited resolutions : [10000000, 5000000, 2500000, 1000000, 500000, 250000, 100000, 50000, 25000, 10000, 5000, 2000, 1000] Fragment-delimited resolutions : []

  4. in IPython, try using strawC:

    import strawC
    straw_out = strawC.strawC('NONE', r'path/to/4DNFI4OUMWZ8.hic', '5', '5', 'BP', 500000)

    See: File doesn't have the given chr_chr map 5_5

  5. in IPython, try using straw.straw:

    import straw
    straw_out = straw.straw('NONE', '4DNFI4OUMWZ8.hic', '5', '5', 'BP', 500000)
    HiC version:  8

See:

---------------------------------------------------------------------------
error                                     Traceback (most recent call last)
<ipython-input-6-00891572fa3d> in <module>
----> 1 straw_out = straw.straw('NONE', '4DNFI4OUMWZ8.hic', '5', '5', 'BP', 500000)

~/miniconda3/lib/python3.7/site-packages/straw/straw.py in straw(norm, infile, chr1loc, chr2loc, unit, binsize, is_synapse)
    508         req.seek(master)
    509
--> 510     list1 = readFooter(req, c1, c2, norm, unit, binsize)
    511     myFilePos=list1[0]
    512     c1NormEntry=list1[1]

~/miniconda3/lib/python3.7/site-packages/straw/straw.py in readFooter(req, c1, c2, norm, unit, resolution)
    130     c1NormEntry=dict()
    131     c2NormEntry=dict()
--> 132     nBytes = struct.unpack('<i', req.read(4))[0]
    133     key = str(c1) + "_" + str(c2)
    134     nEntries = struct.unpack('<i', req.read(4))[0]

error: unpack requires a buffer of 4 bytes

Expected behavior Being able to read the Hi-C file?

srcoulombe commented 4 years ago

I was able to read 4DNFI9DCUOQ1.hic though!

nchernia commented 4 years ago

Do you know if the original problematic file loads in Juicebox? The fact that you could load another might mean there's a corruption issue with the first one. We have a "validate" command you could try with the Juicer Tools jar as well.

On Wed, Jun 10, 2020 at 9:56 PM srcoulombe notifications@github.com wrote:

I was able to read 4DNFI9DCUOQ1.hic https://data.4dnucleome.org/browse/?q=4DNFI9DCUOQ1&type=ExperimentSetReplicate&experimentset_type=replicate though!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/aidenlab/straw/issues/51#issuecomment-642358548, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAK2EW6PXXDQEP6SJYCHOLTRWA2VNANCNFSM4N266KYQ .

-- Neva Cherniavsky Durand, Ph.D. Pronouns: she, her, hers Assistant Professor, Aiden Lab www.aidenlab.org

srcoulombe commented 4 years ago

Do you know if the original problematic file loads in Juicebox? The fact that you could load another might mean there's a corruption issue with the first one. We have a "validate" command you could try with the Juicer Tools jar as well. On Wed, Jun 10, 2020 at 9:56 PM srcoulombe @.***> wrote: I was able to read 4DNFI9DCUOQ1.hic https://data.4dnucleome.org/browse/?q=4DNFI9DCUOQ1&type=ExperimentSetReplicate&experimentset_type=replicate though! — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#51 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAK2EW6PXXDQEP6SJYCHOLTRWA2VNANCNFSM4N266KYQ . -- Neva Cherniavsky Durand, Ph.D. Pronouns: she, her, hers Assistant Professor, Aiden Lab www.aidenlab.org

I'm getting the following error message when I tried to open the problematic file on juicebox using the Load Tracks/URL option:

Error loading track 4DNFI4OUMWZ8.hic: Error accessing resource: https://data.4dnucleome.org/files-processed/4DNFI4OUMWZ8/@@download/4DNFI4OUMWZ8.hic Status: 0
nchernia commented 4 years ago

This seems to be working now - not really sure what happened / if someone at 4DN updated it or there was a server issue.

On Wed, Jun 10, 2020 at 10:13 PM srcoulombe notifications@github.com wrote:

Do you know if the original problematic file loads in Juicebox? The fact that you could load another might mean there's a corruption issue with the first one. We have a "validate" command you could try with the Juicer Tools jar as well. … <#m-1169863519439017840> On Wed, Jun 10, 2020 at 9:56 PM srcoulombe @.***> wrote: I was able to read 4DNFI9DCUOQ1.hic https://data.4dnucleome.org/browse/?q=4DNFI9DCUOQ1&type=ExperimentSetReplicate&experimentset_type=replicate though! — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#51 (comment) https://github.com/aidenlab/straw/issues/51#issuecomment-642358548>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAK2EW6PXXDQEP6SJYCHOLTRWA2VNANCNFSM4N266KYQ . -- Neva Cherniavsky Durand, Ph.D. Pronouns: she, her, hers Assistant Professor, Aiden Lab www.aidenlab.org

I'm getting the following error message when I tried to open the problematic file on juicebox using the Load Tracks/URL option:

Error loading track 4DNFI4OUMWZ8.hic: Error accessing resource: https://data.4dnucleome.org/files-processed/4DNFI4OUMWZ8/@@download/4DNFI4OUMWZ8.hic Status: 0

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/aidenlab/straw/issues/51#issuecomment-642363497, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAK2EW6YUNBKSDBBIOIMLUDRWA4VRANCNFSM4N266KYQ .

-- Neva Cherniavsky Durand, Ph.D. Pronouns: she, her, hers Assistant Professor, Aiden Lab www.aidenlab.org

srcoulombe commented 4 years ago

Yeah that's worked out, closing.