Closed donal1 closed 2 years ago
Hi! We do have the function for computing tm-align scores in the python api (not in the commandline). This is a class method of the StructureMultiple
class called make_rmsd_coverage_tm_matrix
and returns the tm-scores as the third object. You can save the StructureMultiple
class using the commandline tool with the --class
flag, and load it with python.
The computed distance matrix is based on geometricus as described in our paper, or based on the caretta score if using the --full
flag.
I could be confused but is the pairwise tm score of proteins the same as the tm score over a multi alignment? I look at the reported tm score for the AAA family in Homstad computed this with mtm align 0.622. They also output a pairwise tm score but this seems to be different. Is it possible to compute the tm score for the multi alignment as well as for the pairwise. Thanks again.
In general, the pairwise scores are different from the multiple alignment scores. The pairwise score is the optimum alignment score (based on the distance matrix and gap parameters) between the two structures, however, the multiple alignment algorithm uses heuristics (guide tree and progressive alignment step) to compute the multiple alignment efficiently.
For the second part of your question, yes you can get pairwise tm-scores from the multiple sequence alignment but as I mention above, this won't be the most optimum result between the pairs as the multiple sequence alignment is meant to be used to compare all structures at once.
Okay this is what want to know. I'm trying to compare the reported mtm-align results with caretta just to verify. The homstrad dataset has reported tm score for multi alignment of proteins in the families as well m-Tm align also has these reported scores. I am trying to do the same for caretta and obviously I'm missing something. What is the function to get the multiple alignment tm score?
Also I think I'm missing something just to clarify the pairwise score gotten from caretta isn't the same as the mtm-align pairwise matrix? As mtm-align is very time consuming and caretta is quite fast I was using caretta to get tm-scores between proteins.
Okay this is what want to know. I'm trying to compare the reported mtm-align results with caretta just to verify. The homstrad dataset has reported tm score for multi alignment of proteins in the families as well m-Tm align also has these reported scores. I am trying to do the same for caretta and obviously I'm missing something. What is the function to get the multiple alignment tm score?
You can get the pairwise tm-scores from the python api by using the StructureMultiple
class method make_rmsd_coverage_tm_matrix
. If needed, I can make an example gist to show how you can use it.
Also I think I'm missing something just to clarify the pairwise score gotten from caretta isn't the same as the mtm-align pairwise matrix? As mtm-align is very time consuming and caretta is quite fast I was using caretta to get tm-scores between proteins.
Do you mean the commandline score matrix from caretta? If so, then those are caretta scores and not tm-scores. If you used the make_rmsd_coverage_tm_matrix
then I will take a look at why there's a discrepancy between the outputs from the two tools (assuming the alignments are identical).
Thanks I know how to get the pairwise but https://yanglab.nankai.edu.cn/mTM-align/benchmark/homstrad.html reports a tm score from multi alignment it doesn't report a pairwise score. ATPase family associated with various cellular activities (AAA) rerports a single score of 0.513. What I want is to report a single tm score for a family not pairwise.
What I am trying to do is see the tm score reported from caretta mutli alignment is better or worse then mtm align and compare this score to the reported homstrad score as it can be taken as the groundtruth.
It's mentioned in the mtm-align paper that the single tm-score is the mean of the pairwise tm-scores. Then you could do the same to the pairwise output from caretta.
ahh it is great thank you very much I missed that I thought it was something more complex. Thanks again for the quick replies you guys are saving me so much time have a really great weekend.
Apparently it isn't exactly the mean its the mean and normalised by the smaller protein. It's unclear what they mean from the paper. Would you have any idea?
I've narrowed it down to the mtmalign.cpp file.
Sorry this may be a long post with a few questions.
Caretta produces pairwise tm scores. But these scores are not all vs all pairwise alignment like mTM align. As you said this won't be the most optimum result between the pairs as the multiple sequence alignment is meant to be used to compare all structures at once. So the pairwise matrices produced by each method are different? mtm align produces the optimal scores and caretta finds the conserved and variable residues across a set of proteins.
Does normalisation occur twice in mtm align? Tm score formula divides by the length of the interested target as well by d0 the distance scales. This does occur in caretta as well as mTM align to produce the tm score between proteins. But mTM align also performs a secondary normalisation. As reported in their paper "We can calculate the number of structurally equivalent residues (Lali), the associated Root-Mean-Square Deviation (RMSD) and TM-score. Here the TM-score is normalized by the length of the smaller protein. Because the reference MSTAs are available for the HOMSTRAD dataset, we can define another metrics accuracy (ACC)." So if one wants to compare accuracy of Caretta against mTM-align and the HOMSTAD dataset the caretta results would need to be normalised by the smaller protein? If the results from caretta were not to be normalised I think they would be very bad in comparison to mTM-align but I need to be sure.
So I just want to check the ability of caretta to output a similarity score on a cluster of proteins and compare this to mTM align.
I went through the mtm align code they seem to only compute the tm score of the smaller protein in the get_TMscore_from_seqxa function. This is also what they state in the paper.
While you guys as seen in the tm_score function in the multiple alignment .py, compute the tm score of both and return the higher. These means you guys are normalising by both lengths and choosing the max tm score.
Also I don't understand why you guys compute the square root of the common coordinates squared in line 47 and 48 in tm_score. This appears to be different than the reported tm algorithm.
Caretta produces pairwise tm scores. But these scores are not all vs all pairwise alignment like mTM align. As you said this won't be the most optimum result between the pairs as the multiple sequence alignment is meant to be used to compare all structures at once. So the pairwise matrices produced by each method are different? mtm align produces the optimal scores and caretta finds the conserved and variable residues across a set of proteins.
mTM align also produces multiple structure alignment TM scores as far as I know, as it is a multiple structure alignment algorithm, though with a different underlying method than caretta. For pairwise TM scores you would have to use TM-align (https://zhanggroup.org/TM-align/) instead.
While you guys as seen in the tm_score function in the multiple alignment .py, compute the tm score of both and return the higher. These means you guys are normalising by both lengths and choosing the max tm score.
Choosing the so called "target" protein for TM-score calculation could be by the shortest or longest or taking the max as we do. We don't use TM score in Caretta so we didn't experiment with changing these.
Also I don't understand why you guys compute the square root of the common coordinates squared in line 47 and 48 in tm_score. This appears to be different than the reported tm algorithm.
This was a bug, I've fixed this now on master thanks for spotting!
I actually am happy with the non pairwise score, I want to get a tm score as it relates to a group of proteins so the optimum pairwise is unimportant but it is important that if the multi structure pairwise aliment is outputted that it be on par with mtm-align which it should be if your results are correct.
I just want to compare the tm score given for the homstrad families and the reported tm scores from mtm align.
https://yanglab.nankai.edu.cn/mTM-align/benchmark/homstrad.html, the mtm and the homatad families are reported here.
I've also emailed you the mean pairwise scores that caretta outputs for each family. The scores are quite bad and I'm sure its due to the normalisation or bugs.
Yeah thanks for all the help. I realise that the homstrad scores I was comparing too were inappropriate. It would make more sense to compare to accuracy.
Okay this is what want to know. I'm trying to compare the reported mtm-align results with caretta just to verify. The homstrad dataset has reported tm score for multi alignment of proteins in the families as well m-Tm align also has these reported scores. I am trying to do the same for caretta and obviously I'm missing something. What is the function to get the multiple alignment tm score?
You can get the pairwise tm-scores from the python api by using the
StructureMultiple
class methodmake_rmsd_coverage_tm_matrix
. If needed, I can make an example gist to show how you can use it.Also I think I'm missing something just to clarify the pairwise score gotten from caretta isn't the same as the mtm-align pairwise matrix? As mtm-align is very time consuming and caretta is quite fast I was using caretta to get tm-scores between proteins.
Do you mean the commandline score matrix from caretta? If so, then those are caretta scores and not tm-scores. If you used the
make_rmsd_coverage_tm_matrix
then I will take a look at why there's a discrepancy between the outputs from the two tools (assuming the alignments are identical).Okay this is what want to know. I'm trying to compare the reported mtm-align results with caretta just to verify. The homstrad dataset has reported tm score for multi alignment of proteins in the families as well m-Tm align also has these reported scores. I am trying to do the same for caretta and obviously I'm missing something. What is the function to get the multiple alignment tm score?
You can get the pairwise tm-scores from the python api by using the
StructureMultiple
class methodmake_rmsd_coverage_tm_matrix
. If needed, I can make an example gist to show how you can use it.Also I think I'm missing something just to clarify the pairwise score gotten from caretta isn't the same as the mtm-align pairwise matrix? As mtm-align is very time consuming and caretta is quite fast I was using caretta to get tm-scores between proteins.
Do you mean the commandline score matrix from caretta? If so, then those are caretta scores and not tm-scores. If you used the
make_rmsd_coverage_tm_matrix
then I will take a look at why there's a discrepancy between the outputs from the two tools (assuming the alignments are identical).
Hello, may I have an example for the usage of make_rmsd_coverage_tm_matrix
? Thank you :)
@lingnus1 see #18
Hi there,
I was wondering if it's possible to output the tm-scores between proteins. Also what exactly is the distance matrix outputted?
All the best