WDscholia / scholia

Wikidata-based scholarly profiles
https://scholia.toolforge.org
Other
219 stars 78 forks source link

On Scholia landing page, provide some overview stats about Wikidata and scholarly publications in it #336

Closed Daniel-Mietchen closed 4 years ago

Daniel-Mietchen commented 6 years ago

e.g. number of triples in Wikidata

SELECT (count(*) as ?counts) WHERE {
  ?s ?p ?o .
  }

and some WikiCite-focused ones, e.g. as per this list

or some version of http://wikicite.org/statistics.html .

fnielsen commented 6 years ago

Now running https://tools.wmflabs.org/scholia/

Daniel-Mietchen commented 6 years ago

I think adding a few more would be useful, e.g. total number of items and of scientific articles, and then a good selection of properties from the above list and/ or from https://www.wikidata.org/wiki/Template:Bibliographic_properties .

Daniel-Mietchen commented 6 years ago

Here is a query that gives a more comprehensive list:


SELECT ?count ?description
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] ?p [] . }
} AS %triples
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P50 []. }
} AS %authors
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P356 []. }
} AS %dois
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P496 []. }
} AS %orcids
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P698 []. }
} AS %pmids
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P932 []. }
} AS %pmcids
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P2093[]. }
} AS %authorstrings
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P2860 [] . }
} AS %cites
WHERE {
  {
    INCLUDE %triples
    BIND("Total number of triples" AS ?description)
  }
  UNION
  {
    INCLUDE %pmids
    BIND("Items with a PubMed ID" AS ?description)
  }
  UNION
  {
    INCLUDE %pmcids
    BIND("Items with a PubMed Central ID" AS ?description)
  }
  UNION
  {
    INCLUDE %dois
    BIND("Items with a DOI" AS ?description)
  }
  UNION
  {
    INCLUDE %cites
    BIND("Citations" AS ?description)
  }
  UNION
  {
    INCLUDE %authors
    BIND("Links from items about works to items about their authors" AS ?description)
  }
  UNION
  {
    INCLUDE %authorstrings
    BIND("Author name strings on items about works" AS ?description)
  }
  UNION
  {
    INCLUDE %orcids
    BIND("Items about authors with an ORCID profile that has public content" AS ?description)
  }
}
ORDER BY DESC(?count)

Still missing:

fnielsen commented 6 years ago

Added with https://github.com/fnielsen/scholia/commit/b8f8f6a496d0c8bef0f67fcb96afa31a9725cece and now running at https://tools.wmflabs.org/scholia/

Daniel-Mietchen commented 6 years ago

Here are some further ideas on what to include into these stats:

lucaswerkmeister commented 6 years ago

Number of properties:

SELECT (COUNT(*) AS ?propertyCount) WHERE {
  ?property a wikibase:Property.
}

For the number of triples, you can also use ?s ?p ?o (subject predicate object) instead of [] ?p [] – equivalent but slightly more readable :)

Daniel-Mietchen commented 6 years ago

Thanks, @lucaswerkmeister — I've just included it in the above batch of additional stats.

Daniel-Mietchen commented 6 years ago

The above patch caused display problems, so we reverted it. Here is the query again:

SELECT ?count ?description
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] ?p [] . }
} AS %triples
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { ?property a wikibase:Property.  }
} AS %properties
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P50 []. }
} AS %authors
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P69 [] . }
} AS %almamater
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P108 [] . }
} AS %employer
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P166 [] . }
} AS %award_received
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P212 [] . }
} AS %isbn13
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P225 []. }
} AS %taxa
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P234 []. }
} AS %inchi
WITH {
  SELECT (COUNT(DISTINCT ?serials) AS ?count) WHERE { ?serials wdt:P236 [] . }
} AS %issn
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P356 []. }
} AS %dois
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P496 []. }
} AS %orcids
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P625 []. }
} AS %geoloc
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P638 [] . }
} AS %pdb
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P686 [] . }
} AS %gene
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P698 []. }
} AS %pmids
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P699 [] . }
} AS %disease
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P859 [] . }
} AS %sponsor
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P818 [] . }
} AS %arxivID
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P921 []. }
} AS %topics
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P932 []. }
} AS %pmcids
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P1416 [] . }
} AS %affiliation
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P2093 []. }
} AS %authorstrings
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P2427 [] . }
} AS %GRID
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P2860 [] . }
} AS %cites
WHERE {
  {
    INCLUDE %triples
    BIND("Total number of triples" AS ?description)
  }
  UNION
  {
    INCLUDE %properties
    BIND("Total number of properties" AS ?description)
  }
  UNION
  {
    INCLUDE %pmids
    BIND("Items with a PubMed ID" AS ?description)
  }
  UNION
  {
    INCLUDE %pmcids
    BIND("Items with a PubMed Central ID" AS ?description)
  }
  UNION
  {
    INCLUDE %dois
    BIND("Items with a Digital Object Identifier (DOI)" AS ?description)
  }
  UNION
  {
    INCLUDE %cites
    BIND("Citations" AS ?description)
  }
  UNION
  {
    INCLUDE %authors
    BIND("Links from items about works to items about their authors" AS ?description)
  }
  UNION
  {
    INCLUDE %authorstrings
    BIND("Author name strings on items about works" AS ?description)
  }
  UNION
  {
    INCLUDE %orcids
    BIND("Items about authors with an ORCID profile that has public content" AS ?description)
  }
  UNION
  {
    INCLUDE %taxa
    BIND("Items with a taxon name" AS ?description)
  }
  UNION
  {
    INCLUDE %geoloc
    BIND("Items with a geolocation" AS ?description)
  }
  UNION
  {
    INCLUDE %topics
    BIND("Links from items about works to items about their main subjects" AS ?description)
  }
  UNION
  {
    INCLUDE %inchi
    BIND("Items with an International Chemical Identifier (InChI)" AS ?description)
  }
  UNION
  {
    INCLUDE %isbn13
    BIND("Items with a 13-digit International Standard Book Number (ISBN 13)" AS ?description)
  }
  UNION
  {
    INCLUDE %award_received
    BIND("Links from items about people or others to an award they have received" AS ?description)
  }
  UNION
  {
    INCLUDE %affiliation
    BIND("Links from items about people to items about groups they are affiliated with" AS ?description)
  }
  UNION
  {
    INCLUDE %employer
    BIND("Links from items about people to items about their employer" AS ?description)
  }
  UNION
  {
    INCLUDE %almamater
    BIND("Links from items about people to items about the educational establishments they attended" AS ?description)
  }
  UNION
  {
    INCLUDE %issn
    BIND("Items with an International Standard Serial Number (ISSN)" AS ?description)
  }
  UNION
  {
    INCLUDE %arxivID
    BIND("Items with an arxivID" AS ?description)
  }
  UNION
  {
    INCLUDE %GRID
    BIND("Items about institutions with an identifier from the Global Research Identifier Database (GRID ID)" AS ?description)
  }
  UNION
  {
    INCLUDE %sponsor
    BIND("Links from items about anything to items about corresponding sponsors" AS ?description)
  }
  UNION
  {
    INCLUDE %disease
    BIND("Items indexed in the Disease Ontology" AS ?description)
  }
  UNION
  {
    INCLUDE %gene
    BIND("Items indexed in the Gene Ontology" AS ?description)
  }
  UNION
  {
    INCLUDE %pdb
    BIND("Protein structures indexed in the Protein Data Bank" AS ?description)
  }
}
ORDER BY DESC(?count)

Pinging @lucaswerkmeister

lucaswerkmeister commented 6 years ago

What kinds of display problems did it cause?

fnielsen commented 6 years ago

There was no response from WDQS, probably because the query was too lone. Perhaps the getJSON can be modified to a POST.

lucaswerkmeister commented 6 years ago

WDQS already retries using POST if the GET request fails due to being too long. If I run the query in @Daniel-Mietchen’s comment on WDQS, it works both on index.html and embed.html.

fnielsen commented 6 years ago

"Items about authors with an ORCID profile that has public content" Why "that has public content"?

fnielsen commented 6 years ago

"Items with a 13-digit International Standard Book Number (ISBN 13)" This should be rephrased as there might be items with multiple ISBN (there is, especially Springer volume).

Daniel-Mietchen commented 4 years ago

I have reworked the query, as per https://github.com/Daniel-Mietchen/ideas/issues/1022#issuecomment-559724135 .