4dsoftware / pride-toolsuite

Automatically exported from code.google.com/p/pride-toolsuite
0 stars 0 forks source link

ProteinAccessionPattern.java Uniprot Accession/Isoform Regex #13

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
Hello,

I came across your Java source while looking for a quick Uniprot regex. If 
regexes aren't handled drastically differently in Java (I'm using R), then I 
think your Uniprot regex in ProteinAccessionPattern.java on line 55 ( 
http://pride-toolsuite.googlecode.com/svn-history/r769/pride-inspector/branches/
mzidentml/src/main/java/uk/ac/ebi/pride/gui/utils/ProteinAccessionPattern.java 
) will miss the isoform if the accession begins with O, P, or Q. For example 
using R 3.1.2 with the library stringr: 

> 
str_extract("abcP08730-1xyz","[OPQ][0-9][A-Z0-9]{3}[0-9]|[A-NR-Z][0-9]([A-Z][A-Z
0-9]{2}[0-9]){1,2}(-[0-9]+)?")
[1] "P08730"

I suggest changing the regex to 
"[OPQ][0-9][A-Z0-9]{3}[0-9](-[0-9]+)?|[A-NR-Z][0-9]([A-Z][A-Z0-9]{2}[0-9]){1,2}(
-[0-9]+)?"

Again in R:

> 
str_extract("abcP08730-1xyz","[OPQ][0-9][A-Z0-9]{3}[0-9](-[0-9]+)?|[A-NR-Z][0-9]
([A-Z][A-Z0-9]{2}[0-9]){1,2}(-[0-9]+)?")
[1] "P08730-1"

Original issue reported on code.google.com by paulaste...@gmail.com on 29 Dec 2014 at 8:24

GoogleCodeExporter commented 8 years ago
Thanks for your suggestion.. We will consider it. BTW if you are planning to 
use this library. We recently move our projects to github:
  - https://github.com/PRIDE-Toolsuite
  - https://github.com/PRIDE-Utilities
Best Regards and Happy New Year

Yasset

Original comment by ypriverol on 29 Dec 2014 at 10:24

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
In retrospect, this will miss Uniprot accessions without isoforms. Maybe add 
additional regexes to cover both cases, checking for isoforms first: 
"[OPQ][0-9][A-Z0-9]{3}[0-9](-[0-9]+)?|[A-NR-Z][0-9]([A-Z][A-Z0-9]{2}[0-9]){1,2}(
-[0-9]+)?|[OPQ][0-9][A-Z0-9]{3}[0-9]|[A-NR-Z][0-9]([A-Z][A-Z0-9]{2}[0-9]){1,2}"

Original comment by paulaste...@gmail.com on 8 Jan 2015 at 10:06