ComparativeGenomicsToolkit / taffy

This is a library C/Python/CLI for working with TAF (.taf,.taf.gz) and MAF (.maf) alignment files
MIT License
23 stars 3 forks source link

taffy norm seems to miss straightforward case #38

Closed glennhickey closed 7 months ago

glennhickey commented 7 months ago

input maf

##maf version=1

a
s   x   0   9   +   50  CAAATAAGG
s   y   0   8   +   50  CAAATAAG-
s   z   0   9   +   50  CAAATAAGG

a
s   y   8   1   +   50  A

a
s   x   9   41  +   50  CTTGGAAATTTTCTGGAGTTCTATTATATTCCAACTCTCTG
s   y   9   41  +   50  CTTGGAAATTTTCTGGAGTTCTATTATATTCCAACTCTCTG
s   z   9   41  +   50  CTTGGAAATTTTCTGGAGTTCTATTATATTCCAACTCTCTG

running this command doesn't change anything

taffy view -i input.maf | taffy norm -k
glennhickey commented 7 months ago

This is the "raw" output of vg2maf. So as it stands, normalization is able to connect consecutive nodes, but does not handle the variant.

##maf version=1

a
s   x   0   8   +   50  CAAATAAG
s   y   0   8   +   50  CAAATAAG
s   z   0   8   +   50  CAAATAAG

a
s   x   8   1   +   50  G
s   z   8   1   +   50  G

a
s   y   8   1   +   50  A

a
s   x   9   1   +   50  C
s   y   9   1   +   50  C
s   z   9   1   +   50  C

a
s   x   10  3   +   50  TTG
s   y   10  3   +   50  TTG
s   z   10  3   +   50  TTG

tiny1

benedictpaten commented 7 months ago

Taffy norm doesn't merge these blocks because of the -q parameter, which specifies a minimum fraction of shared sequences between two blocks to merge. By default this is set of 0.6, setting it to 0, i.e.:

./bin/taffy view -i temp.maf | ./bin/taffy norm -q 0 | ./bin/taffy view -m

I get:

maf version=1

a s x 9 41 + 50 ----------CTTGGAAATTTTCTGGAGTTCTATTATATTCCAACTCTCTG s x 0 9 + 50 CAAATAAGG------------------------------------------ s y 0 50 + 50 CAAATAAG-ACTTGGAAATTTTCTGGAGTTCTATTATATTCCAACTCTCTG s z 9 41 + 50 ----------CTTGGAAATTTTCTGGAGTTCTATTATATTCCAACTCTCTG s z 0 9 + 50 CAAATAAGG------------------------------------------

which does the merge, but now I need to figure out why it doesn't actually properly merge together the x and z rows

benedictpaten commented 7 months ago

Okay, using PR #40 I get:

maf version=1

a s x 0 50 + 50 CAAATAAGG-CTTGGAAATTTTCTGGAGTTCTATTATATTCCAACTCTCTG s y 0 50 + 50 CAAATAAG-ACTTGGAAATTTTCTGGAGTTCTATTATATTCCAACTCTCTG s z 0 50 + 50 CAAATAAGG-CTTGGAAATTTTCTGGAGTTCTATTATATTCCAACTCTCTG

with "./bin/taffy view -i ./temp.maf | ./bin/taffy norm | ./bin/taffy view -m" which I think was the intent of the test. All the tests run on the branch, but I want to cleanup the docs and add some more tests, then I think this should be good.

benedictpaten commented 7 months ago

Merged #40