ComparativeGenomicsToolkit / cactus

Official home of genome aligner based upon notion of Cactus graphs
Other
499 stars 109 forks source link

Cactus Output Files #22

Open bjea opened 6 years ago

bjea commented 6 years ago

Hello there,

May i ask you a question about the output HAL files, please? Am trying to use h5diff to compare 2 output files that are supposed to be the same (resulted from using the same configFile to run) but are generated by different runs, the command i used is:

docker run --rm -ti -v /tmp:/mnt --entrypoint=h5diff hdfgroup/hdf5-json \ /mnt/pestis_output1.hal /mnt/pestis_new_output1.hal

However, so many differences found, the result is as follows:

image

In addition to this one, i have tried many other combinations, e.g.,

However, the results are all similar --- there are many differences found. May i ask you if you happen to know why, please? Thank you!

Sincerely,

bettie

P.S. Would like to attach some output HAL files and configuration files (e.g. blockTrim1.xml, blockTrim3.xml), but it does not allow me to.

joelarmstrong commented 6 years ago

Hi Bettie,

There are a couple reasons for this. On one level, there is bound to be some small amount of noise in the alignment results even between identical runs, ultimately because of differences in the ordering of what jobs get run when. That is relatively minor. On the hdf5 level there are going to be tons of differences, because there are many possible hdf5 representations of the exact same alignment (differing only in the ordering of sequences within a genome or similar), and we don't really make much of an effort to generate the same representation consistently.

To compare the alignments we usually use mafComparator from the mafTools suite. That will compare the alignment files on the alignment level, checking how many pairwise alignment relationships are shared between two files. You can use something like hal2maf <halFile> --refGenome 2.ANT1a --noAncestors --onlyOrthologs output.maf, then compare the MAFs.

bjea commented 6 years ago

Got it. Will do. Thank you very much for your detailed explanation, Joel!!!