brouwern / compbio2021

Assignments for Computational Biology Fall 2021 at the University of Pittsburgh
0 stars 3 forks source link

Percent identity vs. score #44

Open cmp171 opened 3 years ago

cmp171 commented 3 years ago

https://github.com/brouwern/compbio2021/blob/4fc22d26cd66119a177fd9572bd409ffa989216d/KEY-MSA-walkthrough-shroom.Rmd#L391 When we calculate the percent identity of an alignment of 2 sequences ourselves, should we take the indels into account? There were a few questions on the practice exam where there are indels and those are included as differences in the correct calculation of percent identity.

brouwern commented 3 years ago

Which questions were there? Intel’s don’t normally count. There was a question with tricky layout where dots represented the same nucleotide as a reference sequence and it’s easy to think that they were supposed to indelz.

On Mon, Oct 4, 2021 at 4:44 PM cmp171 @.***> wrote:

https://github.com/brouwern/compbio2021/blob/4fc22d26cd66119a177fd9572bd409ffa989216d/KEY-MSA-walkthrough-shroom.Rmd#L391 When we calculate the percent identity of an alignment of 2 sequences ourselves, should we take the indels into account? There were a few questions on the practice exam where there are indels and those are included as differences in the correct calculation of percent identity.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/brouwern/compbio2021/issues/44, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB32NE2WIDQQKCDZL4OLNQDUFIGZRANCNFSM5FKHF7AQ .

--

Nathan L. Brouwer, PhD

@.***

Lecturer

Department of Biological Sciences https://www.biology.pitt.edu/

University of Pittsburgh

Biostatistics course: brouwern.github.io/BIOSC_1120/index.html

Research Associate

National Aviary, Dept. of Conservation & Field Research https://www.aviary.org/conservation

R code: github.com/brouwern

R tweets: @lobrowR https://twitter.com/lobrowR

cmp171 commented 3 years ago

Oh, yes that is the question that I was referring to. I accidentally read the dots as indels rather than as homologous to another sequence so I was confused about the PID calculation. Thank you!