Closed pyrog closed 5 years ago
It is far better to have most of the URL not hard-wired in this code, but to use the url_prefix and url_suffix parameters to supply that from the formatter URL
Ok, I understand 😄
Note: same issue for P345, P6841, P6996… 😉
I suggest to use another mechanism to do that. Pass two parameters:
%1
, %2
…, where %1
, %2
… will be replaced by the rows of the matches array passed to preg_match()We can't use $1
because it is already used by formatter URL or third-party formatter URL.
The goal of the regex is not to validate $id
, just split it to set $redirect_url
.
(This the job of format constraint to check IDs).
So we could use a "simple" regex…
With this concept, the same code could work for properties 3563, 3723, but even 4033 and more…
Examples:
https://tools.wmflabs.org/wikidata-externalid-url/?url=http://%1/@%2®ex=.*@.*&id=$1
https://tools.wmflabs.org/wikidata-externalid-url/?url=https://msi.nga.mil/queryResults?publications/uscgll?volume=%1&featureNumber=%2&includeRemovals=false&output=html®ex=.*-.*&id=$1
To avoid issue with &
,?
, =
, … we could use an urlencoder to create manually the URL and urldecode()
inside your code.
https%3A%2F%2Fmsi.nga.mil%2FqueryResults%3Fpublications%2Fuscgll%3Fvolume%3D%251%26featureNumber%3D%252%26includeRemovals%3Dfalse%26output%3Dhtml
.*-.*
@pyrog yes, that would be a very general way to do it; I think the approach I would take on it would be to add a new parameter, 'regex=' as you suggest; handling existing entries as they currently are, but doing something special if the regex parameter is passed. Exactly what regexes to allow may be another question... Also, yes, it is necessary to urlencode = and & characters in the formatter URL (wikidata automatically URL-encodes the ID value it appends, which is another issue). See the formatter URL for P1323 for example.
Exactly what regexes to allow may be another question...
Yes.
Properties P1209, P6996, P1695… need 2 regex and 2 urls.
For MMSI P587, one could link to some national registers, so this property need many url/regex pairs.
22[6-8]\d{6}
French ships316\d{6}
Canadian shipsFor the Maritime mobile Access and Retrieval System (MARS), we need 4 pairs:
00\d{7}
CoastStation111\d{6}
SearchAndRescueAircraft99\d{7}
AidsToNavigation[2-7]\d{8}
ShipStationSo to be, more universal, the code should accept arrays of URL and regex…
Hi,
I wrote a new version. Could you test and review it please ? It should be rebased, the examples cleaned and I would like to add a form to url encode parameters...
Regards
@pyrog thanks - looks like it was not a super hard problem after all, I'm glad you tackled this! I've installed the latest version live with your changes, go ahead and try it out with any properties you like...
An NGA ID without a volume number is not valid. The format constraint of P3563 must be updated. All data should be cleaned