blazegraph / database

Blazegraph High Performance Graph Database
GNU General Public License v2.0
892 stars 172 forks source link

Wikidata blazegraph | set labels preferred language #90

Open alsora opened 6 years ago

alsora commented 6 years ago

Hi,

I have reconstructed the Wikidata graph using BlazeGraph. All the standard queries work fine.

Now I would like to set a preferred language for the entities labels, as described here.

PREFIX wikibase: <http://wikiba.se/ontology#>
SERVICE wikibase:label {
  bd:serviceParam wikibase:language "en" .
 }

If I specify any language different from English ("en") all the queries returns empty.

Do you know what should I do in order to enable the possibility of setting different languages?

Thank you

thompsonbry commented 6 years ago

Adding Stas. Bryan

On Wed, May 30, 2018, 00:55 Alberto Soragna notifications@github.com wrote:

Hi,

I have reconstructed the Wikidata graph using BlazeGraph. All the standard queries work fine.

Now I would like to set a preferred language for the entities labels, as described here https://www.mediawiki.org/wiki/Wikidata_Query_Service/User_Manual#Extensions .

PREFIX wikibase: http://wikiba.se/ontology# SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }

If I specify any language different from English ("en") all the queries returns empty.

Do you know what should I do in order to enable the possibility of setting different languages?

Thank you

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/blazegraph/database/issues/90, or mute the thread https://github.com/notifications/unsubscribe-auth/ACdv4ACsdxCRQj5R5MhD_zApOuQwfklHks5t3lBngaJpZM4USxi_ .

alsora commented 6 years ago

@thompsonbry I'm sorry, what do you mean?

thompsonbry commented 6 years ago

Stas is the technical lead on wikidata and developed the label functionality. He is in a better position to answer this question. Bryan

On Thu, May 31, 2018 at 02:02 Alberto Soragna notifications@github.com wrote:

@thompsonbry https://github.com/thompsonbry I'm sorry, what do you mean?

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/blazegraph/database/issues/90#issuecomment-393464692, or mute the thread https://github.com/notifications/unsubscribe-auth/ACdv4JcYPJngyGCnYnNh6GDMNz5U3i9aks5t37GcgaJpZM4USxi_ .

smalyshev commented 6 years ago

@alsora English is not special in any way for WDQS, so labels should work the same in any language. It would be nice to see full example query to figure out what is going on. Also you may want to check the labels actually exist, e.g. ?item rdfs:label ?label. FILTER(lang(?label) = 'fr') and alike.

alsora commented 6 years ago

@smalyshev ok thank you!

This is my query:

  SELECT DISTINCT ?item ?itemLabel  WHERE {
  SERVICE wikibase:label { bd:serviceParam wikibase:language "ja". }

    VALUES (?item) {
      (wd:Q1490)
    }

    #?item rdfs:label ?itemLabel. FILTER(lang(?itemLabel) = 'ja')

}
LIMIT 10

I have tried both the options for selecting the japanese label (specifying the language as bd:serviceParam and filtering on label languages).

This query executes fine on https://query.wikidata.org/ and the entity I'm looking for is the city of Tokyo, which obviously has a Japanese label =)

When I run this query on my blazegraph I get the following output:

smalyshev commented 6 years ago

Looks like you're missing non-English labels. You probably used "single language" option when munging/loading data.

alsora commented 6 years ago

Ok, thank you a lot!

I didn't noticed the parameter -l set to "en" for the munge.sh script.

alsora commented 6 years ago

Hi @smalyshev,

I have repeated all the process with the correct parameters. However I'm still experiencing strange results.

  SELECT DISTINCT ?item ?itemLabel  WHERE {
  SERVICE wikibase:label { bd:serviceParam wikibase:language "ja". }

    VALUES (?item) {
      (wd:Q1490)
    }  
}
LIMIT 10

Now when I run this query in the blazegraph workbench I finally see a result, but the itemLabel is not displayed correctly.

ItemLabel: ������������

The result is that I am not able to add constraints on the itemLabels inside the query.

  SELECT DISTINCT ?item ?itemLabel  WHERE {
  SERVICE wikibase:label { bd:serviceParam wikibase:language "ja". }

    VALUES (?item) {
      (wd:Q1744 )
    }

     ?item rdfs:label ?itemLabel.
     FILTER(REGEX(?itemLabel, "マドンナ"))

}
LIMIT 10

Do you know how can I properly set the encoding of these unicode languages?