FePhyFoFum / phyx

phylogenetics tools for linux (and other mostly posix compliant) computers
blackrim.org
GNU General Public License v3.0
111 stars 17 forks source link

pxaa2cdn -r not working as expected #172

Closed NatJWalker-Hale closed 2 years ago

NatJWalker-Hale commented 2 years ago

Hey Joseph,

The new -r option to pxaa2cdn appears to not work as expected. Here is a little example case:

test.cds.fa.txt test.pep.fa.txt

pxaa2cdn -r -a test.pep.fa.txt -n test.cds.fa.txt gives: Error: for taxon '1' nucleotide alignment involves 3 codons, but protein alignment involves 2 amino acids. Skipping. Error: for taxon '2' nucleotide alignment involves 3 codons, but protein alignment involves 2 amino acids. Skipping.

Which is the same output as pxaa2cdn -a test.pep.fa.txt -n test.cds.fa.txt

Seems that -r is not actually activating the ignoring of the last codon yet?

Thanks,

Nat

NatJWalker-Hale commented 2 years ago

Having a peek, I think (though could be wrong), that this is because there is no check for -r while checking for the mismatch of lengths. I guess that will have to check if they're off by one if -r is invoked. I'll try fiddle with it my end and submit a PR if it works, unless you get to it first.

josephwb commented 2 years ago

Can confirm. Sorry about that. I will have a look later today. It should take a few minutes, but I am heading out the door.

josephwb commented 2 years ago

Ok I see what is going on. Be back soon.

josephwb commented 2 years ago

Should be fixed with 9f15c7ae6f2723701b158514dfdb9eb7427064e3

josephwb commented 2 years ago

Sorry I missed your PR. I think mine works well (will put in a few tests). Yours looks fine if you can assume all files are what they say they are (nuc and associated pep). If you've somehow mixed up files, your fix will allow the program to proceed (i.e., counts are explicitly set to be equal), but would crash if the lengths are not what they should be (say, a nuc alignment shorter than the peptide on). Just thinking of ways people can break things (>ᴗ•)

NatJWalker-Hale commented 2 years ago

That works! Although I note that it makes the codon count in the error message for mismatching lengths off by one, e.g.

test2.cds.fa.txt

pxaa2cdn -r -a test.pep.fa.txt -n test2.cds.fa.txt Error: for taxon '1' nucleotide alignment involves 4 codons, but protein alignment involves 2 amino acids. Skipping. Error: for taxon '2' nucleotide alignment involves 4 codons, but protein alignment involves 2 amino acids. Skipping.

But test2.cds.fa.txt has 5 codons, etc...

Anyway, I see what you mean about my PR, will close it! Many thanks for dealing with it so quickly.

josephwb commented 2 years ago

Erg yes I'll fix the warning.

NatJWalker-Hale commented 2 years ago

legend, thanks!