PRIDE-Archive / xi-mzidentml-converter

Apache License 2.0
0 stars 1 forks source link

Strange protein from Rabbit in a Human dataset #13

Closed ypriverol closed 7 months ago

ypriverol commented 8 months ago

In project PXD035522 (human), we have multiple proteins from Rabbit (P02057) and (G1SSJ7).

lutzfischer commented 8 months ago

For now I can tell you that the fasta file coming along with JPOST upload also contains these proteins

lutzfischer commented 8 months ago

ok looking at the paper:

We used rabbit reticulocyte lysate (RRL), which has downregulated the quantities of most tRNAs except those necessary to decode the efficiently translated α- and β-globin mRNAs.

It seems to be human CCR4-NOT proteins spiked into rabit lysate. We could ask Juri or francis but I think it is correct to have these nine human proteins, up to 720 rabbit proteins and one that is called "Nascent_Peptide"

ypriverol commented 8 months ago

One discussion we need to have with the resource is if we want to expand in the future into more data science pipelines for example, checking if all the reported proteins are part of the organism or other organisms are also reported and what to do with them. We may want to re-annotate the species (adding Rabbit) or in the database said that these proteins are "contaminants".

lutzfischer commented 8 months ago

checking if all the reported proteins are part of the organism or other organisms are also reported and what to do with them. We may want to re-annotate the species (adding Rabbit)

Not a biologist and not involved in the project here but I am not sure if rabbit needs to be mentioned here. The point of the project was to analyse the human proteins. So human is mentioned. Rabbit proteins are rather just a tool to a means and not the focus of the analysis.

or in the database said that these proteins are "contaminants".

Rabbit is not a contaminant either and I am not sure how you would reliably identify contaminants.

But then I also think this discussion probably has very little to do with the mzIdentML converter and rather is a separate pride discussion.

ypriverol commented 8 months ago

Fully, agree with you @lutzfischer Im just wondering if as part of this project of integration between PDB and PRIDE we want to run data science pipelines to "curate" the original data. For me, this rabbit proteins (as you also said) are not the main proteins under study and while they are not contaminants, they are artefacts within the MS experiment that do not have biological meaning.

My point with this issue is to take note of it and trigger the discussion about where do we want to move next after the infrastructure is deployed; and this is a nice case. Thanks for the feedback.

ypriverol commented 8 months ago

This is another interesting case @lutzfischer https://www.ebi.ac.uk/pride/archive/xiview/ws/projects/PXD031632 In this project we can't make sense of the protein accessions.

colin-combe commented 7 months ago

this could be closed? for the specific case of 'Strange protein from Rabbit in a Human dataset' (PXD035522) the way the data is accurately reflects the experiment?

ypriverol commented 7 months ago

I will close this issue as suggested by @colin-combe