ay-lab / dcHiC

dcHiC: Differential compartment analysis for Hi-C datasets
MIT License
57 stars 10 forks source link

Pre-processing data for input #26

Closed papelypluma closed 2 years ago

papelypluma commented 2 years ago

Hi @ay-lab. I'd just like to ask on how we specifically process the input data/files for dcHiC. There's an instruction on Wiki tab about using cooler's dump and preprocessing.py. I'm wondering where this Python script can be obtained.

Thank you!

papelypluma commented 2 years ago

Hi @ay-lab. I found the branch that contains scripts for (pre)-processing data for dcHiC. I should've checked first.

ay-lab commented 2 years ago

Hi—thank you for bringing this up. The old script should work; I also just uploaded a new, more streamlined one to 'utility'. Both should dump cool files the same, though. Please let us know if any other issues crop up!

ay-lab commented 2 years ago

Hi—sorry to re-open the issue, but I just realized while doing some testing afterward that the output of the cool dump is 0-indexed but dcHiC requires the input to be 1-indexed. I will update the processing script tomorrow to reflect that.

ay-lab commented 2 years ago

Just updated it, and did a test run with some sample cool files (check 'utility'!)

papelypluma commented 2 years ago

Thanks for checking this one out, @ay-lab! I've this another question though, not sure if it would be better to open a separate discussion. I'm wondering if the tool can be used for organisms other than human and mouse. I've noticed that the parameter --genome is set to NA by default, but leaving it unspecified will prevent the tool from running. Is there a way to provide custom files for this parameter?

papelypluma commented 2 years ago

Hi @ay-lab. It looks like the tool works with non-model organism or at least those that are not yet in the options for the --genome flag. I've managed to make dcHiC worked on a non-model organism besides what can be specified with --genome (existing options for human and mouse) by supplying custom files in a pre-existing directory whose prefixes (directory and its contents) match that of what is expected (or what we're about to specify) with --genome . The limitation, however, I guess would be on the cytoband file; and the creation of an IGV html for visualization. Nonetheless, it appears that relevant output files are still generated, and no non-zero exit status has been observed.

ay-lab commented 2 years ago

That sounds exactly right! Thank you for letting us know about this 👍 . We actually just updated the code today upon seeing your earlier comment, and the cytoband info isn't necessary for the actual dcHiC run. There's now a run option with --gfolder where you place the user must supply the {genome}.fa, {genome}.tss.bed, and {genome}.chrom.sizes. It makes your process of creating a correctly-named directory more explicit.

papelypluma commented 2 years ago

Thanks @ay-lab for the including this additional option making analysis more streamlined.