ebi-pf-team / genome-properties

GNU General Public License v3.0
12 stars 12 forks source link

GenProp1696 consist of twice the same protein. #48

Open SilasK opened 6 years ago

SilasK commented 6 years ago

The pathway GenProp1696 Curcumin degradation consist of two genes with the same annotation. Which doesn't seem to be specific for the substrate.

LornaMGnify commented 6 years ago

Hi Silas, Unfortunately non-specific evidence is one of the potential consequences of automatically predicting evidence from single proteins, as was done in the automatic creation of MetaCyc properties like this. I have had a look and there doesn't seem to be a suitable specific protein signature to use as evidence at this time, so the entry will be removed from our next release. Thanks for reporting this, Lorna.

SilasK commented 6 years ago

I completely understand.

However a pathway consisting of twice the same Interior domain shouldn’t be a pathway, no? On 6 Nov 2018, 15:01 +0100, happy-lorna notifications@github.com, wrote:

Hi Silas, Unfortunately non-specific evidence is one of the potential consequences of automatically predicting evidence from single proteins, as was done in the automatic creation of MetaCyc properties like this. I have had a look and there doesn't seem to be a suitable specific protein signature to use as evidence at this time, so the entry will be removed from our next release. Thanks for reporting this, Lorna. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

LornaMGnify commented 6 years ago

I agree that in most cases this would not be good practice. However, in this case the two steps in the pathway are each performed by the two functions of a bifunctional protein, so if we had a good specific model for that bifunctional protein, it would make logical sense to have the same model as evidence for both steps. It does make for a pretty "minimal" genome property though.

SilasK commented 6 years ago

Dear @happy-lorna , what would be

a good specific model for that bifunctional protein

LornaMGnify commented 6 years ago

A good specific model would be one that matched examples of the protein in question (in this case NADPH-dependent curcumin/dihydrocurcumin reductase), without matching other different proteins. The model which had been automatically selected matches a C-terminal domain found in alcohol dehydrogenases. While this does match the protein we are trying to detect (NADPH-dependent curcumin/dihydrocurcumin reductase), it also matches many thousands of other proteins which also contain this domain. Hence it is not a good specific model for NADPH-dependent curcumin/dihydrocurcumin reductase.