Closed jhpoelen closed 4 months ago
Currently, the wikidata dump is about 83.5G too large to fit into Zenodo.
Suggest to only include items with reference to a Taxon https://www.wikidata.org/wiki/Q16521
sketch of workflow -
#!/bin/bash
#
# streams Wikidata taxon items (or items containing https://www.wikidata.org/wiki/Q16521)
# from latest data dump in line json (one json object per line)
#
curl --silent "https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.json.bz2"\
| bunzip2\
| grep -E "Q16521[^0-9]"\
| sed 's/,$//g'\
| bzip2
hey @daniel-mietchen
Would you happen to know how to translate a wikimedia url like
into a link that renders a jpg ?
PS I've dropped indexing the wikidata taxon images until we develop a method to point to a image (or image rendering link) directly.
A first pass at implementing an offline-enabled wikidata taxon matcher -
echo -e "\tElymus repens"\
| nomer append\
--include-header wikidata\
| mlr --itsvlite --oxtab cat
produced -
providedExternalId
providedName Elymus repens
relationName HAS_ACCEPTED_NAME
resolvedExternalId WD:Q276262
resolvedName Elymus repens
resolvedAuthorship
resolvedRank WD:Q7432
resolvedCommonNames Gewöhnliche Quecke @de | quackgrass @en | niittyjuola @fi | 偃麦草 @zh
resolvedPath Spermatophytes | Magnoliophyta | Liliopsida | Commelinidae | Cyperales | Poaceae | Pooideae | Triticeae | Elymus | Elymus repens
resolvedPathIds WD:Q25814 | WD:Q14562931 | WD:Q1147601 | WD:Q1115272 | WD:Q1860104 | WD:Q43238 | WD:Q4662262 | WD:Q148694 | WD:Q1072892 | WD:Q276262
resolvedPathNames WD:Q3491997 | WD:Q38348 | WD:Q37517 | WD:Q5867051 | WD:Q36602 | WD:Q35409 | WD:Q164280 | WD:Q227936 | WD:Q34740 | WD:Q7432
resolvedPathAuthorships | | | | | | | | |
resolvedExternalUrl https://www.wikidata.org/wiki/Q276262
Note that non-wikidata identifiers are also supported, if known to wikidata -
e.g.,
echo -e "ITIS:512839"\
| nomer append --include-header wikidata\
| mlr --itsvlite --oxtab cat
providedExternalId ITIS:512839
relationName SYNONYM_OF
resolvedExternalId WD:Q276262
resolvedName Elymus repens
resolvedAuthorship
resolvedRank WD:Q7432
resolvedCommonNames Gewöhnliche Quecke @de | quackgrass @en | niittyjuola @fi | 偃麦草 @zh
resolvedPath Spermatophytes | Magnoliophyta | Liliopsida | Commelinidae | Cyperales | Poaceae | Pooideae | Triticeae | Elymus | Elymus repens
resolvedPathIds WD:Q25814 | WD:Q14562931 | WD:Q1147601 | WD:Q1115272 | WD:Q1860104 | WD:Q43238 | WD:Q4662262 | WD:Q148694 | WD:Q1072892 | WD:Q276262
resolvedPathNames WD:Q3491997 | WD:Q38348 | WD:Q37517 | WD:Q5867051 | WD:Q36602 | WD:Q35409 | WD:Q164280 | WD:Q227936 | WD:Q34740 | WD:Q7432
resolvedPathAuthorships | | | | | | | | |
resolvedExternalUrl https://www.wikidata.org/wiki/Q276262
While working towards addressing a misaligned taxon reported in https://github.com/globalbioticinteractions/globalbioticinteractions/issues/968 by @kbseah, a first version of an offline-enabled wikidata taxon name alignment matcher was introduced in Nomer v0.5.11 .
as related to #146