ComparativeGenomicsToolkit / cactus

Official home of genome aligner based upon notion of Cactus graphs
Other
481 stars 106 forks source link

hal2maf don't contain all the base of reference genome #1362

Closed chun-he-316 closed 2 months ago

chun-he-316 commented 2 months ago

Hello, I have a question. I used the "docker run -v $(pwd):/home/xxx/Documents/software/cactus-v2.8.0 --security-opt seccomp:unconfined --rm quay.io/comparative-genomics-toolkit/cactus:v2.8.0 cactus /home/xxx/Documents/software/cactus-v2.8.0/js /home/xxx/Documents/software/cactus-v2.8.0/evolver27species.txt /home/xxx/Documents/software/cactus-v2.8.0/evolver27species.hal" to abtain the hal file. Then I used "docker run -v $(pwd):/home/xxx/Documents/software/cactus-v2.8.0 --security-opt seccomp:unconfined --rm quay.io/comparative-genomics-toolkit/cactus:v2.8.0 cactus-hal2maf /home/xxx/Documents/software/cactus-v2.8.0/js_hal2maf /home/xxx/Documents/software/cactus-v2.8.0/evolver27species.hal /home/xxx/Documents/software/cactus-v2.8.0/evolver27species.maf.gz --refGenome xxxxxxx --chunkSize 500000 --dupeMode consensus --logFile /home/xxx/Documents/software/cactus-v2.8.0/evolver27species.maf.gz.log" to transform hal into maf file. I found that the maf file does not contain all the scaffolds and loci of the reference genome. What is the reason for this? Is this normal? Please tell me the reason. Thank you.

glennhickey commented 2 months ago

Reference intervals are excluded if they do not align to anything or are scaffold gaps (which don't align anyway).

From what I remember, this behaviour is there to conform to what the genome browser expects. But, you're not the first person to ask about this. Perhaps it would be better to leave these regions in the MAF, and then only filter them out for the browser in cactus-maf2bigmaf .

chun-he-316 commented 2 months ago

How can I leave these regions in the MAF? In addition, I would like to ask a question unrelated to cactus, how to extract the corresponding sequence or a multi-sequence alignment from maf according to the bed file obtained by phastcons?

chun-he-316 commented 2 months ago

Is there a difference between the results of cactus-hal2maf and hal2maf in hal?

glennhickey commented 2 months ago

No, the filter's kind of baked in -- the only way to turn it off is to comment out the code. I will move it out of cactus-hal2maf and into cactus-maf2bigmaf for the next release though. I think the documentation discusses some of the normalizating that cactus-hal2maf does.