Closed wbszhu closed 1 year ago
I think this tool will also be useful for format conversion. Maybe add a parameter(--conversion-only) to represent pure format conversion instead of liftover.
@Nanguage @wbszhu this is a good idea. I will add this function and keep you updated!
Hi all, I've added this function to v0.1.3. To perform a pure format conversion without liftover, just make sure you specify --in-assembly
and --out-assembly
to the same assembly name. Here is an example command to transform a contact matrix from the .cool format to the .hic format::
$ pairLiftOver --input Rao2014-K562-MboI-allreps-filtered.5kb.cool --input-format cooler \
--out-pre K562-format-conversion-test --output-format hic --out-chromsizes hg19.chrom.sizes \
--in-assembly hg19 --out-assembly hg19 --memory 40G
Let me know your feedback!
The feedback is coming~ using the following command.
for i in LW-2
do
for res in 40000
do
pairLiftOver --input ../${i}.balance.mcool::/resolutions/${res} --input-format cooler --out-pre ${i}_${res} --output-format pairs --out-chromsizes ~/22.genome/susScr11_xy/pig11.1_from_star_chrom.sizes --in-assembly ~/22.genome/susScr11_xy/Sus_scrofa.Sscrofa11.1.dna.toplevel.fa --out-assembly ~/22.genome/susScr11_xy/Sus_scrofa.Sscrofa11.1.dna.toplevel.fa --logFile pairLiftOver.log
done
done
pairLiftOver.log
root INFO @ 03/01/22 11:11:07:
# ARGUMENT LIST:
# Input path = ../LW-2.balance.mcool::/resolutions/40000
# Input format = cooler
# Output prefix = LW-2_40000
# Output format = pairs
# Chromosome Sizes of the output assembly = /public/home/luzhang/22.genome/susScr11_xy/pig11.1_from_star_chrom.sizes
# Generate contact maps at 11 resolutions = False
# Input assembly = /public/home/luzhang/22.genome/susScr11_xy/Sus_scrofa.Sscrofa11.1.dna.toplevel.fa
# Output assembly = /public/home/luzhang/22.genome/susScr11_xy/Sus_scrofa.Sscrofa11.1.dna.toplevel.fa
# Chain file = None
# Temporary Dir = .pairliftover
# Allocated memory = 8G
# Number of Processes = 8
# Log file name = pairLiftOver.log
root INFO @ 03/01/22 11:11:27: Trying to perform a pure format conversion without liftover ...
pairLiftOver.utilities INFO @ 03/01/22 11:11:27: Writing headers ...
pairLiftOver.utilities INFO @ 03/01/22 11:11:27: Dumping contact pairs from ../LW-2.balance.mcool::/resolutions/40000 ...
pairLiftOver.utilities INFO @ 03/01/22 11:16:17: Done
zcat LW-2_40000.pairs.gz |head
## pairs format v1.0.0
#shape: upper triangle
#genome_assembly: /public/home/luzhang/22.genome/susScr11_xy/Sus_scrofa.Sscrofa11.1.dna.toplevel.fa
#chromsize: chr1 274330532
#chromsize: chr2 151935994
#chromsize: chr3 132848913
#chromsize: chr4 130910915
#chromsize: chr5 104526007
#chromsize: chr6 170843587
#chromsize: chr7 121844099
zcat LW-2_40000.pairs.gz |grep -v "#"|head
. chr1 20000 chr1 20000 . .
. chr1 20000 chr1 20000 . .
. chr1 20000 chr1 20000 . .
. chr1 20000 chr1 20000 . .
. chr1 20000 chr1 20000 . .
. chr1 20000 chr1 20000 . .
. chr1 20000 chr1 20000 . .
. chr1 20000 chr1 20000 . .
. chr1 20000 chr1 20000 . .
. chr1 20000 chr1 20000 . .
...
zcat LW-2_40000.pairs.gz |grep -v "#"|tail
. chrY 43500000 chrY 43500000 . .
. chrY 43500000 chrY 43500000 . .
. chrY 43500000 chrY 43500000 . .
. chrY 43500000 chrY 43500000 . .
. chrY 43500000 chrY 43500000 . .
. chrY 43500000 chrY 43533914 . .
. chrY 43500000 chrY 43533914 . .
. chrY 43533914 chrY 43533914 . .
. chrY 43533914 chrY 43533914 . .
. chrY 43533914 chrY 43533914 . .
...
as i do in the last comment.
zcat ../pairs/LW-2_40000.pairs.gz |grep -v "#"|wc -l
63809963
and i do in this comment.
zcat LW-2_40000.pairs.gz |grep -v "#"|wc -l
63820471
And i check the contents of two files, they are totally different.
I think this is reasonable, I cannot guarantee the accuracy of your previous results because they were based on a fake chain file and an unnecessary liftover (which might induce errors). And this new version doesn't do any liftover at all, and it simply extracts contact pairs from your cool file.
Thanks, I think you are right. LuZhang
Hi, I have implemented the transformation from cool format to pairs format using the following command.
with a fake --chain-file pig_chain.txt ,like this.
and the pairLiftOver.log
the result like this
Am I right? would you consider adding a simple function to make it work?