DDMAL / linkedmusic-queries

Various methods to query our data lake, e.g., Virtuoso graphs
MIT License
0 stars 0 forks source link

Not able to query lots of strings in RDF(TheSession) #9

Open candlecao opened 2 months ago

candlecao commented 2 months ago

For example, you can not query the session named "Hurley’s Irish Pub" by:

SELECT ?session
WHERE {
  ?session wdt:P2561 "Hurley’s Irish Pub" .
  ?session rdf:type <https://thesession.org/sessions> .
}

But you can make it by adding "@en": ?session wdt:P2561 "Hurley’s Irish Pub"@en . The reason is due to the modification: image

candlecao commented 2 months ago

I don't quite agree on this rendering because: (1) We can not guarantee that all of these are definitely in English. (2) It will cause burden to LLM2SPARQL, intensifying the inaccuracy. (3) We can use English as the default language so that there is no need to specify this; for other languages, we may supplement with tags such as @zh for Chinese @fr for French...

@fujinaga Hi, Ich, do you agree?

Yueqiao12Zhang commented 1 month ago

@fujinaga

fujinaga commented 1 month ago

There should always be a language tag in every string. We can always instruct ChatGPT to append the language tags in SPARQL queries.

Yueqiao12Zhang commented 1 month ago

Ok. Does this mean that I have to automatically detect the language of every string in my script?

fujinaga commented 1 month ago

No. For each database we import, we should know which language it's in. For now you can default always to @en. If we are storing chant text from CantusDB, that would be in Latin.

ahankinson commented 1 month ago

There are several codes that you can use for non-coded languages:

Type: script
Subtag: Zyyy
Description: Code for undetermined script
Added: 2005-10-16
%%
Type: script
Subtag: Zzzz
Description: Code for uncoded script
Added: 2005-10-16
%%
Type: language
Subtag: und
Description: Undetermined
Added: 2005-10-16
Scope: special

https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry

Note: "und should not be used unless a language tag is required and language information is not available or cannot be determined. Omitting the language tag (where permitted) is preferred. This subtag may also be useful when matching language tags in certain situations. Where xml:lang="" is allowed by the markup, it is better to use that rather than und"

From a search for "und" here: https://r12a.github.io/app-subtags/

See: https://www.w3.org/International/questions/qa-no-language#undetermined

candlecao commented 1 month ago

Thank you @ahankinson . Could you please give me some vivid examples plus explanation, which incorporate some tag in RDF

ahankinson commented 1 month ago

Could you please give me some vivid examples plus explanation, which incorporate some tag in RDF

No, because you can use Google as well as I can. :-)