Closed Rohit-Satyam closed 1 year ago
I just realized I have been using AA substitution matrix above and I might need a nucleotide substitution matrix. So I decided to do the following. However, I get the following error
nmx <-nucleotideSubstitutionMatrix(match = 1, mismatch = 0, baseOnly = FALSE, type = "DNA")
msaConservationScore(ali,substitutionMatrix=nmx)
Error in msaConservationScore.matrix(mat, ...) :
some letters occurring in alignment 'x' are missing in substitution matrix
What to do? I think that the error is generated due to missing value of -
(gap) from nucleotide substitution matrix.
EDIT1: Made DNA substitution matrix the following way and the error subsides. However, I wish to be reassured if this is the right way to do it?
nmx <-nucleotideSubstitutionMatrix(match = 2, mismatch = -1, baseOnly = FALSE, type = "DNA")
nmx <- cbind(nmx,`-`=nmx[,15])
nmx <- rbind(nmx,`-`=c(nmx[,15],-0.25))
msaConservationScore(ali,substitutionMatrix=nmx)
@Rohit-Satyam, as specified in the package vignette on p. 25, "[...] msaConservationScore() computes sums of pairwise scores for a given substitution/scoring matrix". These values, therefore, are numbers that might exceed the unit inteval. Negative values definitely hint a highly volatile sequence positions, while high positive values hint at high levels of conservation. So that explains the numbers that you observed. As you already noticed, the substitution matrix also needs to score bases versus gaps. It seems you decided to copy the 'N' column/row into a '-' column/row. I agree that this is a good solution that way. So I close this issue. Thanks for using our package and for contributing this discussion!
Hi,
I have MSA of 899 DNV viral genomes and I import the .aln file obtained from Kalign3 in R. I now wish to calculate conservation score to find conserved blocks. I thought this score to range between 0 and 1 but instead I get the following results
I don't understand how do I interpret these numbers?
Code used: