Daniel-Mietchen / ideas

A dumping ground for halfbaked ideas, some of which will hopefully be worked on soon
Other
26 stars 6 forks source link

Use existing P1932 (stated as) statements to help disambiguate P2093 (author name string) statements #488

Open Daniel-Mietchen opened 6 years ago

Daniel-Mietchen commented 6 years ago

i.e. if we have an item with a P2093 (author name string) of "Smith J. W. X." or perhaps even "Smith J W X" or similar, the tool would look for paper items with a P50 (author) statement and a P1932 (stated as) qualifier "Smith J. W. X." and suggest these authors as potential candidates for switching those P2093 statements to P50 ones.

Daniel-Mietchen commented 5 years ago

There is now a Listeria list up at https://www.wikidata.org/wiki/Wikidata:WikiProject_Source_MetaData/Wikidata_lists/Author_name_strings_matched_to_author_items_using_Stated_As .

Daniel-Mietchen commented 5 years ago

I tried to adapt that to an institutional context but did not get it to work properly, so here's what I have so far:

SELECT
# Author of a paper with a "stated as" statement for authorship
  ?item

  # Sample work by the author with that "stated as" value
  ?pub

  # Build URL to the Author disambiguator tool
  (CONCAT(
      '[https://tools.wmflabs.org/author-disambiguator/?doit=Look+for+author&name=',
      ENCODE_FOR_URI(?authorstring), ' ',?authorstring , ']') AS ?string_resolver)

# Number of works with an author name string that matches the one above
  ?count

WITH {
  SELECT
    (COUNT(?work) AS ?count)
    ?authorstring
  WHERE {
    ?work wdt:P2093 ?authorstring  .
    ?work wdt:P50 ?author  .
    { ?author wdt:P108 / wdt:P361* wd:Q213439 .}
    UNION
    { ?author wdt:P463 / wdt:P361* wd:Q213439 .}
    UNION
    { ?author wdt:P1416 / wdt:P361* wd:Q213439 .}    
    FILTER(!regex (?authorstring, "^[A-Za-z]{1}.\\s")).
  }
  GROUP BY ?authorstring ?item
} AS %result
WITH {
  SELECT DISTINCT  ?authorstring ?item #(SAMPLE(?work1) AS ?pub) 
                                 ?count
  WHERE {
  INCLUDE %result
  ?work1 p:P50 ?author_statement .
  ?author_statement ps:P50 ?item .
  ?author_statement pq:P1932 ?authorstring .
  }
  GROUP BY ?authorstring ?item #?pub
                         ?count
} AS %stateds
WHERE {
 INCLUDE %stateds

}
ORDER BY DESC(?count)
LIMIT 200
Daniel-Mietchen commented 5 years ago

Probably needs some more finetuning with regexes, so I played a bit with https://regex101.com/ .

Daniel-Mietchen commented 5 years ago

A simple change of the regex to

    FILTER(regex (?authorstring, "^(?=^[A-Z][a-z]{1,}.*)(?=.*[a-z]$).*$")).

seems to make this query useful.

Will set up a Listeria page for the query now.

Daniel-Mietchen commented 5 years ago

https://www.wikidata.org/wiki/Wikidata:University_of_Virginia/Listeria/UVa_people/Author_name_strings_matched_to_UVa_people_items_using_Stated_As