CrossRef / rest-api-doc

Documentation for Crossref's REST API. For questions or suggestions, see https://community.crossref.org/
Other
734 stars 269 forks source link

Issue with counts in subject category facet #407

Open jenniferlin15 opened 6 years ago

jenniferlin15 commented 6 years ago

From Daniel Letson at Turnitin

https://api.crossref.org/works?filter=full-text.application:similarity-checking&facet=category-name:*

The result that comes back include the facets in a list, with the counts of works in each facet, as well as a 'total-results' count, and a paginated list of the works themselves, like this:

"facets": { "category-name": { "value-count": 326, "values": { "General Medicine": 613792, "Linguistics and Language": 330949, "Language and Linguistics": 326747, ... } }, "total-results": 49270539,

The issue is, when we export this data to a spreadsheet and total the counts within each facet, the number ends up being nowhere near the 'total-results' count (in a recently-run query, the facets summed to 5,927,779, while the total results were 49,270,539. Are most works just not categorized? Or is there a better way we should be counting works by subject matter?

ppolischuk commented 6 years ago

Many works do not have any value for the category field, so this is more or less expected behavior. @jenniferlin15 are you able to pass this back to Daniel from Turnitin?

ppolischuk commented 6 years ago

More notes: the REST API applies subject categories, so if there are DOIs without a value for the category field, something might not be right. Here is a DOI without a subject: http://api.crossref.org/works/10.1109/vppc.2008.4677487

ppolischuk commented 6 years ago

GO-1113