bridgedb / BridgeDb

The BridgeDb Library source code
https://bridgedb.org/
Apache License 2.0
28 stars 21 forks source link

idRegexPattern for RefSeq missing "WP" prefix #45

Closed randykerber closed 7 years ago

randykerber commented 7 years ago

The new Uniprot-->RefSeq linkset file would not load into IMS. IMS loader could not find any existing patterns to match this RefSeq URI: http://purl.uniprot.org/refseq/WP_011154765.1

The Miriam record for RefSeq in the MiriamRegistry.ttl file contains this line:

      idot:idRegexPattern "^((AC|AP|NC|NG|NM|NP|NR|NT|NW|XM|XP|XR|YP|ZP)_\\d+|(NZ\\_[A-Z]{4}\\d+))(\\.\\d+)?$"^^<http://www.w3.org/2001/XMLSchema#string> ;

Thus does not allow the ID part of the URI to begin with "WP".

randykerber commented 7 years ago

If a "WP" alternative is added to that idRegexPattern in MiriamRegistry.ttl, the Uniprot-->RefSeq linkset file will load into IMS.

randykerber commented 7 years ago

Online Miriam page for RefSeq shows URI Regex that also does not include "WP": http://www.ebi.ac.uk/miriam/main/datatypes/MIR:00000039

Even though apparently this kind of RefSeq ID has been used since 2013: https://www.ncbi.nlm.nih.gov/news/06-11-2013-wp-refseqs/