arthurpsmith / wikidata-tools

My collection of tools for interacting with the wikidata database
14 stars 15 forks source link

Add P3563 and P 3723 (NGA and USCG Lists of lights) #18

Closed pyrog closed 5 years ago

pyrog commented 5 years ago

For id's that don't start with 11x- there is no volume, and as far as I can tell the proposed URL fails No, just click on each "undefined" to get the beacon.

An NGA ID without a volume number is not valid. The format constraint of P3563 must be updated. All data should be cleaned

pyrog commented 5 years ago

It is far better to have most of the URL not hard-wired in this code, but to use the url_prefix and url_suffix parameters to supply that from the formatter URL

Ok, I understand 😄

Note: same issue for P345, P6841, P6996… 😉

I suggest to use another mechanism to do that. Pass two parameters:

  1. a regex that split ID
  2. an URL with e.g. %1, %2…, where %1, %2… will be replaced by the rows of the matches array passed to preg_match()

We can't use $1 because it is already used by formatter URL or third-party formatter URL.

The goal of the regex is not to validate $id, just split it to set $redirect_url. (This the job of format constraint to check IDs). So we could use a "simple" regex…

With this concept, the same code could work for properties 3563, 3723, but even 4033 and more…

Examples:

To avoid issue with &,?, =, … we could use an urlencoder to create manually the URL and urldecode() inside your code.

arthurpsmith commented 5 years ago

@pyrog yes, that would be a very general way to do it; I think the approach I would take on it would be to add a new parameter, 'regex=' as you suggest; handling existing entries as they currently are, but doing something special if the regex parameter is passed. Exactly what regexes to allow may be another question... Also, yes, it is necessary to urlencode = and & characters in the formatter URL (wikidata automatically URL-encodes the ID value it appends, which is another issue). See the formatter URL for P1323 for example.

pyrog commented 5 years ago

Exactly what regexes to allow may be another question...

Yes.

Properties P1209, P6996, P1695… need 2 regex and 2 urls.

For MMSI P587, one could link to some national registers, so this property need many url/regex pairs.

For the Maritime mobile Access and Retrieval System (MARS), we need 4 pairs:

So to be, more universal, the code should accept arrays of URL and regex…

pyrog commented 5 years ago

Hi,

I wrote a new version. Could you test and review it please ? It should be rebased, the examples cleaned and I would like to add a form to url encode parameters...

Regards

arthurpsmith commented 5 years ago

@pyrog thanks - looks like it was not a super hard problem after all, I'm glad you tackled this! I've installed the latest version live with your changes, go ahead and try it out with any properties you like...

pyrog commented 5 years ago

Thanks 😄

Tested with: