Wikidata / Wikidata-Toolkit-Examples

Examples showing how to use Wikidata Toolkit as a Maven library in your project
https://www.mediawiki.org/wiki/Wikidata_Toolkit
Apache License 2.0
49 stars 23 forks source link

RDF Exports #4

Open Livvi opened 5 years ago

Livvi commented 5 years ago

Hi,

I am trying to create an RDF Export from a current Wikidata dump (20181105).

First I tried to use the toolkit client (v0.8.0) and I always got 31 triples, no matter what parameters I tried to use.

Now I am using the version 0.9.0 of the toolkit in eclipse, but I am getting some warnings and errors.

One Warning I am encountering for several language codes is:
Unknown Wikimedia language code "inh". Using this code in RDF now, but this might be wrong.

And for various properties I get the errors:
Count not export SomeValueSnak for property P1971: OWL range not known. or Could not fetch datatype of http://www.wikidata.org/entity/P883. Assuming type http://wikiba.se/ontology#String

Furthermore I am trying to filter the data by english and german using setLanguageFilter, but it has no effect. I added the following to the RdfSerializationExample but I get the same amount of triples with or without it:

Set<String> languageSet = new HashSet<String>();
languageSet.add("en"); 
languageSet.add("de");
dumpProcessingController.setLanguageFilter(languageSet);
Tpt commented 5 years ago

Hi! Thank you for the report!

WikidataToolkit v0.8 is indeed not working with recent Wikidata dumps.

One Warning I am encountering for several language codes is: Unknown Wikimedia language code "inh". Using this code in RDF now, but this might be wrong.

It just means that "inh" is not in the conversion dictionary from Wikimedia language code to proper BCP47 language codes because it was not used by Wikidata at the time of the v0.9 release. You could safely ignore this warning. It is going to be added in the next release.

Could not fetch datatype of http://www.wikidata.org/entity/P883. Assuming type http://wikiba.se/ontology#String

It's because this property have been deleted and so the library is not able to fetch its datatype from the API: https://www.wikidata.org/wiki/Property:P883

And for various properties I get the errors: Count not export SomeValueSnak for property P1971: OWL range not known.

It looks like a bug. I have reported it in a specific issue: https://github.com/Wikidata/Wikidata-Toolkit/issues/405

Furthermore I am trying to filter the data by english and german using setLanguageFilter, but it has no effect.

It's strange. I'm going to investigate it.

Livvi commented 5 years ago

Thank you very much for the fast response!

When using maven, how do I install the newest version of the toolkit with the bugfix?

Tpt commented 5 years ago

Just change the version number to the latest one (0.9) in the pom.xml file

Livvi commented 5 years ago

Thanks!

In the meantime i got another problem, it seems, that the toolkit does not support the type 'form' yet. https://github.com/Wikidata/Wikidata-Toolkit/issues/407