WDscholia / scholia

Wikidata-based scholarly profiles
https://scholia.toolforge.org
Other
222 stars 79 forks source link

Co-author graph in the topic aspect is slow #733

Closed fnielsen closed 4 years ago

fnielsen commented 5 years ago

Co-author graph in the topic aspect is slow, see, e.g., https://tools.wmflabs.org/scholia/topic/Q311383

This problem has also occurred in other aspects, see, e.g., #533

Daniel-Mietchen commented 5 years ago

For some topics like Parkinson's disease, the graph now stalls completely.

I tried to curb that by modifying the query such that it only exposes the most salient co-authorship connections, but in the current version, this times out:

#defaultView:Graph
SELECT ?author1 ?author1Label ?rgb ?author2 ?author2Label
WITH {
  # Find works with the topic
  SELECT ?work WHERE {
    ?work wdt:P921 / (wdt:P31* / wdt:P279* | wdt:P361+ | wdt:P1269+) wd:Q11085 .
  }
} AS %works
WITH {
  # Limit the number of authors
  SELECT (COUNT(?work) AS ?count1) ?author1 WHERE {
    INCLUDE %works
    ?work wdt:P50 ?author1 .
  }
  GROUP BY ?author1
  ORDER BY DESC(?count1)
  LIMIT 10
} AS %authors1
WITH {
  # Limit the number of authors
  SELECT (COUNT(?work) AS ?count2) ?author2 WHERE {
    INCLUDE %works
    INCLUDE %authors1
    ?work wdt:P50 ?author1 , ?author2 .
    FILTER (?author1 != ?author2) 
  }
  GROUP BY ?author2
  ORDER BY DESC(?count2)
  LIMIT 10
} AS %authors2
WHERE {
  INCLUDE %works
  INCLUDE %authors1
  INCLUDE %authors2
  OPTIONAL { ?author1 wdt:P21 ?gender1 . }
  BIND( IF(?gender1 = wd:Q6581097, "3182BD", "E6550D") AS ?rgb)
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "en,fr,de,ru,es,zh,jp".
  }
}
Daniel-Mietchen commented 4 years ago

I slightly modified the above query (mainly dropping the INCLUDE %works line), and it now seems to work also for busy topics. I played with the LIMITs a bit and think the combination 25 for author1 and 250 for author2 works fine.

So here is the modified version:

#defaultView:Graph
SELECT ?author1 ?author1Label ?rgb ?author2 ?author2Label
WITH {
  # Find works with the topic
  SELECT ?work WHERE {
    ?work wdt:P921 / (wdt:P31* / wdt:P279* | wdt:P361+ | wdt:P1269+) wd:Q11085 .
  }
} AS %works
WITH {
  # Limit the number of authors
  SELECT (COUNT(?work) AS ?count1) ?author1 WHERE {
    INCLUDE %works
    ?work wdt:P50 ?author1 .
  }
  GROUP BY ?author1
  ORDER BY DESC(?count1)
  LIMIT 25
} AS %authors1
WITH {
  # Limit the number of coauthors
  SELECT DISTINCT ?author2 ?author1  (COUNT(?work) AS ?count2)  WHERE {
    INCLUDE %works
    INCLUDE %authors1
    ?work wdt:P50 ?author1 , ?author2 .
    FILTER (?author1 != ?author2) 
  }
  GROUP BY ?author2 ?author1 
  ORDER BY DESC(?count2)
  LIMIT 250
} AS %authors2
WHERE {
  INCLUDE %authors2
  OPTIONAL { ?author1 wdt:P21 ?gender1 . }
  BIND( IF(?gender1 = wd:Q6581097, "3182BD", "E6550D") AS ?rgb)
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "en,fr,de,ru,es,zh,jp".
  }
}
Daniel-Mietchen commented 4 years ago

The above commit had an encoding problem, which was fixed with https://github.com/fnielsen/scholia/commit/bb2ee219a259c3dc3ce9d7b4c3acf572b047e600#diff-04a29c3a9ff21a4d023f710f69057842 . Looks good to me now.

Daniel-Mietchen commented 4 years ago

Deployed now. Closing.