deeptools / HiCExplorer

HiCExplorer is a powerful and easy to use set of tools to process, normalize and visualize Hi-C data.
https://hicexplorer.readthedocs.org
GNU General Public License v3.0
227 stars 70 forks source link

hicTransform for matrix arithmetic? #831

Closed jaclyn-marie closed 1 year ago

jaclyn-marie commented 1 year ago

Hi!

Thank you for developing a great suite of tools, and for the excellent documentation. If you have a moment, I have a question regarding editing one/two of your .py scripts.

I would like to try to scale my matrix (HiChIP data) in a CPM: (counts/total counts)*100000. I can understand most of the code, but I am still a beginner. Would editing parts of your utilities/hicTransform code be a quick way to achieve this? I think these would be the chunks of code I would have to combine into a new .py or .sh (importing all dependencies accordingly).

I understand if this is beyond your interest, but any tips would be greatly appreciated!

Sincerely, Jaclyn

hicTransform (editing obs/exp to cpm accordingly)

import warnings
warnings.simplefilter(action="ignore", category=RuntimeWarning)
warnings.simplefilter(action="ignore", category=PendingDeprecationWarning)
import argparse

from scipy.sparse import csr_matrix, lil_matrix
import numpy as np

from hicmatrix import HiCMatrix as hm
from hicexplorer._version import __version__
from hicexplorer.utilities import obs_exp_matrix_lieberman, obs_exp_matrix_non_zero, obs_exp_matrix
from hicexplorer.utilities import convertNansToZeros, convertInfsToZeros

import logging
log = logging.getLogger(__name__)

def parse_arguments(args=None):
def _obs_exp(pSubmatrix):

    obs_exp_matrix_ = obs_exp_matrix(pSubmatrix)
    obs_exp_matrix_ = convertNansToZeros(csr_matrix(obs_exp_matrix_))
    obs_exp_matrix_ = convertInfsToZeros(csr_matrix(obs_exp_matrix_))
    # log.error('obs_exp_matrix_.data {}'.format(obs_exp_matrix_.data))
    # if len(obs_exp_matrix_.data) == 0:
    #     log.debug('No data!')
    #     return np.array([[]])
    return obs_exp_matrix_  # .todense()
def main(args=None):

    args = parse_arguments().parse_args(args)

    if not args.outFileName.endswith('.h5') and not args.outFileName.endswith('.cool'):
        log.error('Output filetype not known.')
        log.error('It is: {}'.format(args.outFileName))
        log.error('Accepted is .h5 or .cool')
        exit(1)

    if args.matrix.endswith('cool') and args.chromosomes is not None and len(args.chromosomes) == 1:
        hic_ma = hm.hiCMatrix(pMatrixFile=args.matrix, pChrnameList=args.chromosomes)
    else:
        hic_ma = hm.hiCMatrix(pMatrixFile=args.matrix)
        if args.chromosomes:
            hic_ma.keepOnlyTheseChr(args.chromosomes)

    trasf_matrix = lil_matrix(hic_ma.matrix.shape)

    if args.method == 'obs_exp':
        if args.perChromosome:

            for chrname in hic_ma.getChrNames():
                chr_range = hic_ma.getChrBinRange(chrname)
                submatrix = hic_ma.matrix[chr_range[0]:chr_range[1], chr_range[0]:chr_range[1]]
                submatrix.astype(float)
                submatrix_chr = _obs_exp(submatrix)
                if len(submatrix_chr.data) == 0:
                    submatrix_chr = lil_matrix(submatrix_chr.shape)
                else:
                    submatrix_chr = lil_matrix(submatrix_chr)
                trasf_matrix[chr_range[0]:chr_range[1], chr_range[0]:chr_range[1]] = submatrix_chr
        else:
            submatrix = _obs_exp(hic_ma.matrix)
            trasf_matrix = csr_matrix(submatrix)

Utils (edit the function accordingly) import...

def obs_exp_matrix_lieberman(pSubmatrix, pLength_chromosome, pChromosome_count):
    """
        Creates normalized contact matrix M* by
        dividing each entry by the gnome-wide
        expected contacts for loci at
        that genomic distance. Method: Lieberman-Aiden 2009
    """

    expected_interactions_in_distance_ = expected_interactions_in_distance(pLength_chromosome, pChromosome_count, pSubmatrix)
    row, col = pSubmatrix.nonzero()
    distance = np.ceil(np.absolute(row - col) / 2).astype(np.int32)

    if len(pSubmatrix.data) > 0:
        data_type = type(pSubmatrix.data[0])

        expected = expected_interactions_in_distance_[distance]
        pSubmatrix.data = pSubmatrix.data.astype(np.float32)
        pSubmatrix.data /= expected
        pSubmatrix.data = convertInfsToZeros_ArrayFloat(pSubmatrix.data).astype(data_type)
    return pSubmatrix
lldelisle commented 1 year ago

Hi, Maybe I misanderstood what you want to do but I think you can use hicNormalize to multiply by a constant that you can obtain with hicInfo.

jaclyn-marie commented 1 year ago

Hi @lldelisle,

Thank you for the suggestion! I will give this a try. This is a much easier fix than the idea I had : )

Jaclyn

jaclyn-marie commented 1 year ago

Thank you @lldelisle, hicNormalize is what was looking for!