allenai / s2-folks

Public space for the user community of Semantic Scholar APIs to share scripts, report issues, and make suggestions.
Other
183 stars 29 forks source link

Q: Are normalized venues guaranteed to be unique? #100

Closed wammar closed 1 year ago

wammar commented 1 year ago

I was playing with the S2 API today with the goal of working towards ways to summarize and compare research output by individual authors in the context of the communities they contribute to. Ultimately this would involve querying not just the info/papers of a specific author, but then querying for information about the venues/year they publish in. This leads to the following questions:

  • First, is the "normalized" publication venue field guaranteed to be unique across venues? That is, can we use that as a "key" for uniquely identifying a venue?

  • Second, is there a way to query for all papers at a specific venue in a given year? (I don't see any API endpoint for that, but was curious if there are any hidden/restricted end points that may exist for folks with the "right" API key?)

Submitted on behalf of Jeff Heer (via email)

wammar commented 1 year ago

First, is the "normalized" publication venue field guaranteed to be unique across venues? That is, can we use that as a "key" for uniquely identifying a venue?

Yes, we're using a canonical list of unique venue names, and we map the venue raw strings to that list.

Second, is there a way to query for all papers at a specific venue in a given year? (I don't see any API endpoint for that, but was curious if there are any hidden/restricted end points that may exist for folks with the "right" API key?)

You can filter the results of the /paper/search endpoint by adding venue=Nature,Science.

Example input:

https://api.semanticscholar.org/graph/v1/paper/search?query=covid+vaccination&venue=Science&fields=title,venue

Expected output:

// 20230519181045
// https://api.semanticscholar.org/graph/v1/paper/search?query=covid+vaccination&venue=Science&fields=title,venue

{
  "total": 978,
  "offset": 0,
  "next": 10,
  "data": [
    {
      "paperId": "0986a0cd217bbe125bf9998efb1e62f9ee93923e",
      "title": "COVID-19 vaccination: The road ahead",
      "venue": "Science"
    },
    {
      "paperId": "02f8f5945cde5a784d2554174ecb7795c95563a8",
      "title": "COVID-19 vaccination passports",
      "venue": "Science"
    },
    {
      "paperId": "e34b9ac93d280f15fa241d3a52636bff7a1e1c02",
      "title": "COVID-19 vaccination and menstruation",
      "venue": "Science"
    },
    {
      "paperId": "e0234a9f1368af2795f7e5f2fa686f334d1eb60a",
      "title": "Israel reports link between rare cases of heart inflammation and COVID-19 vaccination in young men",
      "venue": "Science"
    },
    {
      "paperId": "463e8027a3db3c651a12a68bb375fb6ce6200708",
      "title": "Cash incentives, ethics, and COVID-19 vaccination.",
      "venue": "Science"
    },
    {
      "paperId": "9706185703810e66862c17eaae37f21088f4a996",
      "title": "COVID-19 is 10 times deadlier for people with Down syndrome, raising calls for early vaccination",
      "venue": "Science"
    },
    {
      "paperId": "e036d8e4e7a4fcda89d1afde308ce5697a5045f0",
      "title": "Immunological characteristics govern the transition of COVID-19 to endemicity",
      "venue": "Science"
    },
    {
      "paperId": "d95be8f068f751d101b9918149c165005eff8cc4",
      "title": "Low-dose mRNA-1273 COVID-19 vaccine generates durable memory enhanced by cross-reactive T cells",
      "venue": "Science"
    },
    {
      "paperId": "0cf56e03eb3dd7d1b564f4d1c16bf98635b5a3c4",
      "title": "Monetary incentives increase COVID-19 vaccinations",
      "venue": "Science"
    },
    {
      "paperId": "3271b60c4008db66fada3c979e8505584ca7bc81",
      "title": "Vaccination with BNT162b2 reduces transmission of SARS-CoV-2 to household contacts in Israel",
      "venue": "Science"
    }
  ]
}

CC: @power10dan @rodneykinney

rodneykinney commented 1 year ago

Note that year is available as a search filter in addition to venue. Also, it's technically possible that the name of a venue could be shared, while publicationVenue.id is guaranteed to be unique.

https://api.semanticscholar.org/graph/v1/paper/search?query=covid+vaccination&venue=Science&year=2019&fields=title,venue,year,publicationVenue