Open armijnhemel opened 1 year ago
can you explain more? I would like to work on this.
can you explain more? I would like to work on this.
(Tagging @pombredanne for additional background information)
The way that I would envision is that in purldb there would be an optional field with data that could include the Wikidata identifier. Wikidata identifiers are short alphanumerical identifiers, starting with 'Q'. The one for bash is 'Q189248' (as linked above). So if I would query purldb for bash, or I would get results for bash (not sure how that will work, @pombredanne can probably clarify) and there is a known Wikidata identifier it would be returned. I suppose that the data model of purldb will allow for such kind of extra data.
On the indexing side things are a little bit murkier. Not every package out there will have a Wikidata item associated with it. In fact, I am expecting it to be fairly rare. There are a few methods I can think of:
It looks like currently there are a bit over 15,000 entries in Wikidata that have the property "source code property". After extracting the wikidata identifier from the search results, the data for the entry itself can be queried and then it can be cross-correlated with any existing data in purldb so it can be enhanced.
I manually did a query for the property "source code repository" in wikidata and downloaded the results in JSON. I have added them here. This would just be a first step.
@pombredanne would this work best as an "improver"?
re: "would this work best as an "improver"?"
That's going to be a regular visitor/mapper here IMHO
There is more that is available in the wikidata data that could be useful for cross referencing. Looking at for example https://www.wikidata.org/wiki/Q11246433 there is:
and perhaps a few other useful things.
For quite some open source packages there is a Wikidata identifier, for example for GNU bash: https://www.wikidata.org/wiki/Q189248
It could be interesting to add this information, if available, so the identifier could then be used by other tools to for example look up/display data from Wikipedia. I am not suggesting that the contents from wikipedia pages should be indexed, but only recording Wikidata identifiers if available.