epoyraz / mztab

Automatically exported from code.google.com/p/mztab
0 stars 0 forks source link

Peptides table: "accession" value if peptide assigned to multiple proteins #5

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
In the current specification it's stated (page 26): "The protein's accession 
the peptide is associated with. In case no protein section is present in the 
file or the peptide was not assigned to a protein the field should be filled 
with “NA”."

It's not clear from this description how peptides shared by several proteins 
should be treated? Should it be NA (but then "unique" column doesn't make sense 
since it's true iff the accession is not NA), or should it be comma-separated 
list of the protein accession codes (in this case "unique" column also looks 
redundant, maybe it could be replaced by the column specifying the number of 
protein peptide could be assigned to, "num_proteins_shared")?

Original issue reported on code.google.com by astuka...@gmail.com on 30 Nov 2012 at 3:04

GoogleCodeExporter commented 9 years ago
Only one main protein accession needs to be provided. The others can be members 
of the ambiguity_members. This was done in a very generic way for the sake of 
simplicity. This is the definition of "ambiguity_members":

A comma-delimited list of protein accessions. This field should be set in the
representative protein of the ambiguity group (the protein identified through 
the
accession in the first column). The accessions listed in this field should 
identify
proteins that could also be identified through these peptides but were not
chosen by the researcher or resource. The members of the ambiguity group
are not reported in the protein table for the respective unit. The exact
semantics of how the ambiguity members were defined depends on the
resource.

The only way to report all protein accessions the peptide maps to with the same 
hierarchy is replicating the same peptide element in different rows.

Original comment by javizca74@gmail.com on 30 Nov 2012 at 4:42

GoogleCodeExporter commented 9 years ago
Thanks for the clarification!

The "ambiguity_members" column addresses slightly different problem. There 
could be peptides shared by the unambiguously identified proteins.

Of course, it's possible to duplicate the peptide information per each protein, 
but that would increase the size of the file and there is a chance (or, at 
least, confusion) that quantitative information would differ between the rows 
describing the same peptide. BTW, does the specification impose somewhere the 
uniqueness constraint on peptides table (i.e. specify "compound unique key")?

Original comment by astuka...@gmail.com on 30 Nov 2012 at 5:03

GoogleCodeExporter commented 9 years ago
Yes, one entry in the peptide table ("one peptide") must only be assigned to 
one protein. The "accession" column must only contain one single protein 
accession. So the relation peptide->protein is unique. Of course, one protein 
can have multiple peptides with the exact same sequence (if identified from 
different spectra for example).

BTW, there is no unique key defined for the peptide table. 

Original comment by javizca74@gmail.com on 2 Dec 2012 at 9:03