glennhickey / progressiveCactus

Distribution package for the Prgressive Cactus multiple genome aligner. Dependencies are linked as submodules
Other
80 stars 26 forks source link

.hal output with nothing aligned on near-identical single-sequence genome assemblies #14

Closed MarioStanke closed 9 years ago

MarioStanke commented 10 years ago

While my progressiveCactus installation works fine on other genomes, it does not seem to align anything on this input of three bacterial genomes of the same species that are very similar but not identical. Each genome consists of a single sequence with names NC_002952, NC_016941, NC_017331, respectively.

Running

wget http://bioinf.uni-greifswald.de/bioinf/tmp/cactus/test.cactus wget http://bioinf.uni-greifswald.de/bioinf/tmp/cactus/NC_002952.fa wget http://bioinf.uni-greifswald.de/bioinf/tmp/cactus/NC_016941.fa wget http://bioinf.uni-greifswald.de/bioinf/tmp/cactus/NC_017331.fa wget http://bioinf.uni-greifswald.de/bioinf/tmp/cactus/cactus_progressive_config.xml

runProgressiveCactus.sh --configFile=cactus_progressive_config.xml test.cactus cactusout test.hal halStats test.hal

produces

hal v2.1 ((NC_002952:0.1,NC_016941:0.1)Anc1:0.1,NC_017331:0.1)Anc0;

GenomeName, NumChildren, Length, NumSequences, NumTopSegments, NumBottomSegments Anc0, 2, 0, 1, 0, 0 Anc1, 2, 0, 0, 0, 0 NC_002952, 0, 2902619, 1, 1, 0 NC_016941, 0, 2762785, 1, 1, 0 NC_017331, 0, 3043210, 1, 1, 0

It looks like the ancestral sequences are empty and

hal2maf --noAncestors --refGenome NC_002952 test.hal test.maf

does not produce any alignment with more than 1 row.

cactus_progressive_config.xml only differs from the file that comes with the distribution by this parameter

filterByIdentity="0"

It also does not produce an alignment if I omit the tree, or if I use the default config file.

joelarmstrong commented 10 years ago

I will double-check this in a bit, but at first glance I think the problem here may be that the genomes are entirely lowercase, and so the entire genome is considered soft-masked?

benedictpaten commented 10 years ago

Yes - that must be it! I think it would be appropriate to print a warning message to the log file (this would be reported from cactus_setup.c) if an input contig is entirely lower-case.

I will do this today.

On Fri, Mar 7, 2014 at 9:20 AM, Joel Armstrong notifications@github.comwrote:

I will double-check this in a bit, but at first glance I think the problem here may be that the genomes are entirely lowercase, and so the entire genome is considered soft-masked?

Reply to this email directly or view it on GitHubhttps://github.com/glennhickey/progressiveCactus/issues/14#issuecomment-37045897 .

MarioStanke commented 10 years ago

I will retry with upper case sequences on Monday. Thanks.

epaule commented 9 years ago

Yup, a warning or error message would help a lot.

benedictpaten commented 9 years ago

Yes +1, we should fix this.

On Tue, Sep 23, 2014 at 6:20 AM, Michael Paulini notifications@github.com wrote:

Yup, a warning or error message would help a lot.

— Reply to this email directly or view it on GitHub https://github.com/glennhickey/progressiveCactus/issues/14#issuecomment-56517992 .