ay-lab / mustache

Multi-scale Detection of Chromatin Loops from Hi-C and Micro-C Maps using Scale-Space Representation
MIT License
64 stars 11 forks source link

Legacy hic-straw code comment #55

Closed joreynajr closed 2 years ago

joreynajr commented 2 years ago

Just making a comment, as of hic-straw 1.1.0 there is a new format to access HiC file data which is still being used in Mustache. To make it future proof you can switch to this newer version which just slightly modifies a few lines of code here: https://github.com/ay-lab/mustache/blob/master/mustache/mustache.py

Lines 307-329 are currently:

    if not CHRM_SIZE:
        hic = hicstraw.HiCFile(f)
        chromosomes = hic.getChromosomes()
        chrSize_in_bp = {}
        for i in range(1, len(chromosomes)):
            chrSize_in_bp["chr" + str(chromosomes[i].name).replace("chr", '')] = chromosomes[i].length
        CHRM_SIZE = chrSize_in_bp["chr" + chr1.replace("chr", '')]

    CHUNK_SIZE = max(2 * distance_in_bp / res, 2000)
    start = 0
    end = min(CHRM_SIZE, CHUNK_SIZE * res)  # CHUNK_SIZE*res
    result = []
    val = []

    while start < CHRM_SIZE:
        print(int(start), int(end))
        if not norm_method:
            temp = hicstraw.straw("observed", "KR", f, str(chr1) + ":" + str(int(start)) + ":" + str(int(end)),
                                  str(chr2) + ":" + str(int(start)) + ":" + str(int(end)), "BP", res)
        else:
            temp = hicstraw.straw("observed", str(norm_method), f,
                                  str(chr1) + ":" + str(int(start)) + ":" + str(int(end)),
                                  str(chr2) + ":" + str(int(start)) + ":" + str(int(end)), "BP", res)

And they can now be switched to the new hic-straw version using:

    # create a hicstraw object linked to file f
    hic = hicstraw.HiCFile(f)

    # get chrom data if needed
    if not CHRM_SIZE:
        chromosomes = hic.getChromosomes()
        chrSize_in_bp = {}
        for i in range(1, len(chromosomes)):
            chrSize_in_bp["chr" + str(chromosomes[i].name).replace("chr", '')] = chromosomes[i].length
        CHRM_SIZE = chrSize_in_bp["chr" + chr1.replace("chr", '')]

    CHUNK_SIZE = max(2 * distance_in_bp / res, 2000)
    start = 0
    end = min(CHRM_SIZE, CHUNK_SIZE * res)  # CHUNK_SIZE*res
    result = []
    val = []

    # extract count data
    while start < CHRM_SIZE:
        print(int(start), int(end))
        if not norm_method:
            mzd = hic.getMatrixZoomData(str(chr1), str(chr2), "observed", "KR", "BP", res)
            temp = mzd.getRecordsAsMatrix(int(start), int(end), int(start), int(end))
        else:
            mzd = hic.getMatrixZoomData(str(chr1), str(chr2), "observed", str(norm_method), "BP", res)
            temp = mzd.getRecordsAsMatrix(int(start), int(end), int(start), int(end))

For more info checkout: https://pypi.org/project/hic-straw/