4dn-dcic / hic2cool

Lightweight converter between hic and cool contact matrices.
MIT License
63 stars 7 forks source link

pip version broken #24

Closed joachimwolff closed 5 years ago

joachimwolff commented 5 years ago

Hi,

installing hic2cool version 0.5 with pip fails with:

Downloading https://files.pythonhosted.org/packages/2b/28/65b9170c2e24d3b2b8864e176e5de61f2abae4bdafaff162e34f2a3387f2/hic2cool-0.5.0.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-ohDcFj/hic2cool/setup.py", line 10, in <module>
        with open('requirements.txt') as f:
    IOError: [Errno 2] No such file or directory: 'requirements.txt'

However, version 0.4.2 works.

Best,

Joachim

carlvitzthum commented 5 years ago

Hi @joachimwolff,

Thanks for the feedback. I have released version 0.5.1, which should fix this problem. Please let me know if the pip install continues to fail.

Also, if you have created any cooler files using version 0.4.2, please ensure to update them with hic2cool update when you have the new version. See here for more info.

Best, Carl

joachimwolff commented 5 years ago

Hi,

Thanks for the update and the warning in respect of division and multiplication of correction factors. Do I understand it correctly that previously you did CorrectedMatrix_i,j = originalMatrix_i,j * correctionFactor_i * correctionFactor_j and this changes to CorrectedMatrix_i,j = originalMatrix_i,j / correctionFactor_i * correctionFactor_j therefore the correction factors are just 1/correctionFactor of the previous? Do you provide any flag in the cool files with the information which version of hic2cool they were created? I use your API in my software to transform hic to cool files and I don't think my users will like it to take care of this issue on their own.

Best,

Joachim

carlvitzthum commented 5 years ago

@joachimwolff

Yes, you understand correctly. For more information you can see this issue, written by the creator of cooler.

To your second question, there is an attribute in the cooler file called generated-by that keeps track of the hic2cool version used to make the file. hic2cool update automatically uses it to determine what changes to make -- currently this is only the inversion of weights after version from version 0.5.0 onwards. After updating, the attribute is modified so further updates will not run again on the same file.

You can use cooler or h5py to get the generated-by info:

from cooler import Cooler
cool = cooler.Cooler(<cooler file>)
# generated-by is in form: 'hic2cool_' + version
cool.info['generated-by']

If you want to programmatically update files from older versions, you can leverage the hic2cool package as so:

from hic2cool import hic2cool_update
# will update the input cooler file directly. silent=True disables command line confirmation
hic2cool_update(<cooler file>, silent=True)
# OR, leave the input file unchanged and write to a new one
hic2cool_update(<cooler file>, <target cooler file>, silent=True)

You can also provide the silent argument from the command line: hic2cool_update <file> --silent.

Best, Carl

joachimwolff commented 5 years ago

Hi Carl,

Thanks for your useful reply. I have one more question:

Is it: CorrectedMatrix_i,j = originalMatrix_i,j / correctionFactor_i * correctionFactor_j as I wrote or CorrectedMatrix_i,j = originalMatrix_i,j / correctionFactor_i / correctionFactor_j as it is written in the comment from @nvictus?

Thanks a lot.

Joachim

carlvitzthum commented 5 years ago

@joachimwolff @nvictus

In general, Hic balancing: CorrectedMatrix_i,j = count_i,j / (hic_weight_i * hic_weight_j) Cooler balancing: CorrectedMatrix_i,j = count_i,j * cooler_weight_i * cooler_weight_j

Cooler still uses multiplicative balancing. We just no longer invert the hic weights. hic2cool < 0.5.0: weight=1/hic_weight and therefore CorrectedMatrix_i,j = count_i,j * weight_i * weight_j hic2cool >= 0.5.0: weight=hic_weight and therefore CorrectedMatrix_i,j = count_i,j / (weight_i * weight_j)

As of hic2cool 0.5.0, the weights from hic are now preserved as divisive weights. cooler ICE still uses multiplicative weights. This change was implemented to reach a standard among multiple tools that did or did not invert juicer weights. It is now expected that downstream tools will expect weights to be divided if they are called KR/VC/VC_SQRT. This was already the case for HiGlass. However since cooler doesn't yet explicitly handle these exceptions, attempting to run cooler.matrix(balance= ... ) for a hic weight will no longer "correctly" handle these weights by taking the inversion into account.

Best, Carl

joachimwolff commented 5 years ago

Thanks a lot for clarification.

We always had than the divisive way to store the data in our HiCExplorer. Anyway, this is an unpleasant situation for our users if they don't know which type of correction factors they have stored in their cooler files.

nvictus commented 5 years ago

How about attaching some metadata to the attrs of the weight vectors? We already store some useful metadata when using cooler balance (you can take a look using the cooler attrs command).

It could be as simple as divisive: True. Taken to be False if missing (with the exception of the 3 special cases above).