McGill-CSB / PHYLO

a gaming framework to align genomic data
phylo.cs.mcgill.ca/edge
Other
11 stars 14 forks source link

Question: Why is this score 35, and not 36 #117

Open movermeyer opened 6 years ago

movermeyer commented 6 years ago

This was the starting position of a puzzle: odd_score

Why is the displayed score of this is 35, and not 36?:

1 gap         = -4
0 gap extends =  0
40 matches    = 40
0 mismatches  =  0
              -----
                36

Tested on PHYLO Web v2.0.1

waldispuhl commented 6 years ago

Sorry for the delay, I havent noticed your comment earlier. We actually modified the scoring scheme and counting gap extends at extremities (no gap open though). We will post more info on the website ASAP.

movermeyer commented 6 years ago

Thanks.

I've been trying to reverse engineer the scoring algorithm to get a better understanding of what makes a good alignment. But I keep getting tripped up by cases that don't seem to match the counts displayed at the top.

I'm looking forward to seeing the new code and better docs/info on the site.

waldispuhl commented 6 years ago

We have developed a customized version of the Fitch algorithm. Are you familiar with it. I'll try to retrieve a PDF explaining the exact scheme and post it. We are planning to release more material about it in upcoming the month.

movermeyer commented 6 years ago

I can't say I'm familiar with it (or genomics in general. I'm just learning now). I'll take a look. I found this nice looking PDF for Fitch's Algorithm. But It'll be nice to see how yours differs.

akashzcoder commented 6 years ago

There is an additional attribute in the scoring type2: trail score penalty. This has been recently added to phylo which is why game GUI is not reflecting the trail score. New score works like this: subScore = ((match 1) + (mismatch (-1)) + open(-4)+ extend(-1) + trail*(-1)); 38951307-bba9efc8-4337-11e8-8e36-d6b61f82f172-2 Since, this puzzle was occupying a length of 20(max number of nucleotides in each sequence) in the original gene sequences, hence an increase in length gives a trail penalty to the other sequence. This is because when the solution will be aggregated in the original MSA then this trail will also behave like a gap.

akashzcoder commented 6 years ago

The following concern has been addressed. Please find the update below: image_new

chrisdrogaris commented 6 years ago

Hi movermeyer

An updated build with the trail parameter added can be found on this link : https://phylo.cs.mcgill.ca/play.html

movermeyer commented 6 years ago

@chrisdrogaris Thanks. While I now understand why it exists, it doesn't have an explanation yet in the info dialog:

explanation

Two other questions:

  1. When does the version number change? It still says v2.0.1. Or is that link considered "beta" and the version hasn't been bumped yet?
  2. When will the code be made available on GitHub?