OpenMS / PDCommunityNodes

Community nodes for Thermo Proteome Discoverer
1 stars 3 forks source link

Protein accession strings not consistent #8

Closed hannesveit closed 8 years ago

hannesveit commented 8 years ago

The native PD tables and our result tables use different protein accession strings. After reading the results of the TOPP tools, we should map the protein accessions contained therein back to the native PD accession strings.

hannesveit commented 8 years ago

In order to avoid reverse-engineering PD's accession parsing rules (as this code does not seem to be reusable from within a node), this is now solved by a workaround reading all TargetProteins via the EntityDataService and creating a dictionary mapping the native FASTA accessions to the PD notation, e.g., "sp|P20053|PRP4_YEAST" -> "P20053".