My opinion reading your code is that there is no real check whether a particular protein complex, as it is defined in a particular model for a particular reaction, is really satisfied or not. This in my opinion can lead to wrong GPRs.
I report here below your code contained in carveme/reconstruction/scoring.py , converting the table 'gene_scores' in 'protein_scores', and finally converting 'protein_scores' in 'reaction_scores'. Then I try to give a practical example.
From gene to protein scores:
def merge_subunits(genes):
genes = genes.dropna()
if len(genes) == 0:
return None
else:
protein = ' and '.join(sorted(genes))
if len(genes) > 1:
return '(' + protein + ')'
else:
return protein
def merge_subunit_scores(scores):
return scores.fillna(0).mean()
protein_scores = gene_scores.groupby(['protein', 'reaction', 'model'], as_index=False) \
.agg({'query_gene': merge_subunits, 'score': merge_subunit_scores})
protein_scores.rename(columns={'query_gene': 'GPR'}, inplace=True)
Applying the above code would result in this 'protein_score' table:
protein
reaction
model
GPR
score
P_STM0843+STM0844
R_PFL
STM_v1_0
(gene_13306 and gene_7500)
470.0
P_STM0970+STM0973
R_PFL
STM_v1_0
gene_18818
432.0
P_STM0970+STM3241
R_PFL
STM_v1_0
gene_18818
423.0
P_STM4114+STM4115
R_PFL
STM_v1_0
None
0.0
And finally in this 'reaction_score' table:
reaction
GPR
score
normalized_score
R_PFL
((gene_13306 and gene_7500) or gene_18818)
470.0
1.0
As you can read , the final GPR is ((gene_13306 and gene_7500) or gene_18818).
The member “(gene_13306 and gene_7500)” is correct, as the both the components of the complex P_STM0843+STM0844 were well matched by Diamond.
Instead, the member “gene_18818” should not appear in the final GPR in my opinion, because neither the complex P_STM0970+STM0973 nor the complex P_STM0970+STM3241 were satisfied, since Diamond didn’t really catch the gene STM_v1_0.STM0970 (score = Nan).
This simplified example pretend to work with a 'gene_scores' table having just 1 model with 1 reaction, anyway I think it is enough to highlight the issue.
With a real-life 'gene_scores' table, there is the chance that other models containing the same reaction can make the final GPR actually correct, balancing the errors. But it is just a chance...
Maybe it could be useful to have a dedicated option, like carve --strictgpr, if the user wants the original protein complex definitions to be strictly satisfied.
Using carveme v1.5.2.
My opinion reading your code is that there is no real check whether a particular protein complex, as it is defined in a particular model for a particular reaction, is really satisfied or not. This in my opinion can lead to wrong GPRs.
I report here below your code contained in carveme/reconstruction/scoring.py , converting the table 'gene_scores' in 'protein_scores', and finally converting 'protein_scores' in 'reaction_scores'. Then I try to give a practical example.
From gene to protein scores:
From protein to reaction scores:
Suppose now this is my 'gene_scores' table:
Applying the above code would result in this 'protein_score' table:
And finally in this 'reaction_score' table:
As you can read , the final GPR is
((gene_13306 and gene_7500) or gene_18818)
. The member “(gene_13306 and gene_7500)” is correct, as the both the components of the complex P_STM0843+STM0844 were well matched by Diamond.Instead, the member “gene_18818” should not appear in the final GPR in my opinion, because neither the complex P_STM0970+STM0973 nor the complex P_STM0970+STM3241 were satisfied, since Diamond didn’t really catch the gene STM_v1_0.STM0970 (score = Nan).
This simplified example pretend to work with a 'gene_scores' table having just 1 model with 1 reaction, anyway I think it is enough to highlight the issue. With a real-life 'gene_scores' table, there is the chance that other models containing the same reaction can make the final GPR actually correct, balancing the errors. But it is just a chance...
Maybe it could be useful to have a dedicated option, like
carve --strictgpr
, if the user wants the original protein complex definitions to be strictly satisfied.