Closed Kelly-ST-HRI closed 1 year ago
I confirmed that this patch allows hic2cool update
to run without errors and produce the expected result files. Headers are shown correctly with h5dump
.
Thanks for your quick response. I’m honestly not sure the differences between mcool v2 and v3. Perhaps this reference is helpful to plan migration to support v3? https://cooler.readthedocs.io/en/latest/schema.html#previous-schema-versions
Please let me know if I am mistaken and the output files are already mcool v3 format. For context we had trouble importing the data into 3rd party software which parses the format-version parameter from the root attributes, not the group attributes. As discussed here: #27
Sorry for the confusion, this change is unrelated to the original purpose of the PR. I also had to change group permissions on the output files to allow other users to import them into R but we confirmed it works now.
Hi @Kelly-ST-HRI !
The cooler data collection schema and the mcool layout are versioned separately. The mcool version just specifies a layout of many cooler data collections. Its latest version is 2. I guess this should be clarified in the docs.
Feel free to reach out if you have other questions.
Sorry for not carefully reading the diff before commenting. Looks like what you changed is the cooler format version, not the mcool version.
As the summary says, v3 really just adds a metadata tag storage-mode
to indicate that a full matrix is stored instead of an upper triangle (e.g. to store a non-symmetric matrix). In v2, it is always assumed that the data is an upper triangle, representing a symmetric matrix.
@nvictus Thank you for clarifying! Your original explanation helped to understand the different versioning parameters actually. However, it seems the duplicate format-version
attributes may be parsed by downstream 3rd party tools (e.g,. R/bioconductor packages) incorrectly from misunderstanding this. If this is the case, I'll report it to the respective developer.
For now I've reverted changes to hic2cool/hic2cool_config.py
as out of scope of this PR. I'll confirm from our internal data whether hic2cool is generating mcool v2 or v3 format results by checking h5dump for the storage-mode
setting.
I confirmed that the output for hic2cool update
returns a file with "storage-mode" as "symmetric-upper" as described for Cooler format v3. hic2cool convert
does not so I think it defaults to Cooler format v2 or relies on hic2cool update
to add metadata to clarify this. The output appears to be backwards compatible with tools developed for v2 anyway since it is a symmetric matrix. I'll leave the original settings as submit this PR as-is.
This discussion also resolves my concerns that newer versions will be compatible with downstream tools. For example currently accepts mcool and cool files following cool format version 2 and all prior versions. I do not make any guarantees for future version):
HiCBricks currently accepts mcool and cool files following cool format version 2 and all prior versions. I do not make any guarantees for future version
Thanks for your advice! I hope the minor PR is useful to others running into similar issues invoking the tool via Bash CLI rather than within Python.
the duplicate format-version attributes may be parsed by downstream 3rd party tools (e.g,. R/bioconductor packages) If this is the case, I'll report it to the respective developer.
Oh, that's unfortunate. Thank you for reporting it!
HiCBricks currently accepts mcool and cool files following cool format version 2 and all prior versions. I do not make any guarantees for future version
I see. It sounds like this resulted from confusion in the thread you linked earlier: the format
and format-version
attributes should be considered jointly. It's meant to be a strength that a single cooler matrix can be interpreted in a consistent manner no matter where it is placed within the HDF5 file. That's why we version layouts separately from schema. e.g. It allowed us to support a new single-cell layout without having to change anything about the single-matrix schema.
It's unfortunate if the HiCBricks rejects COOL
v3 outright because all that is required is to check that the storage-mode
is symmetric-upper
and there should be no incompatibilities. The MCOOL
format will still be v2.
@nvictus it seems to accept them actually but hic2cool update
needs to be run after hic2cool convert
so the headers aren't missing format attributes. I managed to get HiCBricks to parse an mcool file but I had to resolve file permissions, a bug with my mismatched resolution settings (10 million was too high), and use non-default normalisation parameter.
I think the developer is right to clarify that the tool was tested on older versions and not guaranteed to be stable with updated dependencies. The logs display the version parsed before the error so we interpreted this disclaimer carefully and investigated compatibility of various versions of hic2cool in conda environments.
Provided mcool is a version 3 file. Error in rep(chr, Offset) : invalid 'times' argument
It appears the issue was actually a mismatch between resolution settings for my reference genome and the data to import. Therefore I think no further changes (except those suggested in this PR) are needed for hic2cool or HiCBricks for them to be interoperable.
If I find any other problems, I'll open a separate issue or PR. Thanks for your feedback and consideration to improve the (already comprehensive) documentation)
Any reason this hasn't been merged yet (if approved)? I understand maintainers have other commitments and this project is not actively developed recently.
Thanks for your great package. We've managed to correct issues with our reference genome and import the MCOOL data in the latest format into 3rd party R packages discussed above. This patch resolved the issue in our case.
I think the block was us not having merge permissions. @alexander-veit, can this be merged?
Thank you for the PR.
@clarabakker kindly agreed to maintain this repository going forward. She will move the dependency management to Poetry and will release a new version soon(ish).
Closes #65
Resolves this problem calling "hic2cool update ..." in the bash command line:
I checked this error occurred with both hic2cool v0.7.1 and 0.8.3.
Note that this workaround no longer works:
Current versions of h5py only allow "r" as default permissions. Note that "w" permissions will not allow reading the original files for "r+" or "a" is needed. https://docs.h5py.org/en/stable/high/file.html