Closed emanuil-tolev closed 12 years ago
Wow, my mind is surprisingly vacant at certain times. The regex identification works just fine - the problem is that we've ALSO got a feature which, given a URL, will check whether the identifier is actually what we think it is.
E.g. the Digital Object Identifier regex may match a given identifier (such as both examples above), but only ONE of them is ACTUALLY an assigned DOI. If you try resolving them both at http://dx.doi.org/, you will find that 10.1186/1758-2946-3-49
resolves fine. Changing an arbitrary digit (the 8
in the 1758
to a 5
) results in a technically valid, but non-assigned DOI: 10.1186/1755-2946-3-49
.
So our service (correctly) decides that string is NOT a DOI.
Well, good to affirm that feature's working properly...
When identifying an identifier using the tests stored in the index, IDFind seems to fail rather strangely.
If we take
10.1186/1758-2946-3-47
, the example from the front page, this is successfully matched by the^((http:\/\/){0,1}dx.doi.org/|(http:\/\/){0,1}hdl.handle.net\/|doi:|info:doi:){0,1}(?P10..+\/.+)
regex we wrote at DevXS.If we then try to match
10.1186/1758-2946-3-49
(change last digit to 9) that works too. However, if we then try to match10.1186/1755-2946-3-49
(change 1758 to 1755, so just 1 digit) - this fails!This shouldn't happen according to my reading of the regex which allows a .+ at that point... so changing the digit from 8 to 5 shouldn't cause it to not match.
I'll try to investigate this later by just compiling this regex with re in the REPL and try to figure out what's wrong.