allenai / s2-folks

Public space for the user community of Semantic Scholar APIs to share scripts, report issues, and make suggestions.
Other
144 stars 25 forks source link

Bug: Legitimate papers removed from release to release #174

Closed kochbj closed 2 months ago

kochbj commented 6 months ago

Describe the Bug Hi, thanks for creating a great service! I am trying to download all papers from some CS conferences (e.g., AAAI, NeurIPS) but I'm finding that legitimate papers are removed from SS from release to release. I'm not sure how frequently this happens, but it's making it hard to reproduce my research results.

To Reproduce

For example, I found this paper in release 09112023:

{"corpusid":211530585,"externalids":{"ACL":null,"DBLP":"conf/aaai/DorrB0MHSCSZS20","ArXiv":null,"MAG":"2998331601","CorpusId":"211530585","PubMed":null,"DOI":"10.1609/AAAI.V34I05.6269","PubMedCentral":null},"url":"https://www.semanticscholar.org/paper/77e61c39ee59a03be8813f961cb1b327926dcce2","title":"Detecting Asks in Social Engineering Attacks: Impact of Linguistic and Structural Knowledge","authors":[{"authorId":"1752326","name":"B. Dorr"},{"authorId":"2632964","name":"Archna Bhatia"},{"authorId":"36235196","name":"Adam Dalton"},{"authorId":"1505503134","name":"Brodie Mather"},{"authorId":"1505542669","name":"Bryanna Hebenstreit"},{"authorId":"50629423","name":"Sashank Santhanam"},{"authorId":"1470639547","name":"Zhuo Cheng"},{"authorId":"145102721","name":"Samira Shaikh"},{"authorId":"1877429","name":"Alan Zemel"},{"authorId":"1791072","name":"T. Strzalkowski"}],"venue":"AAAI","year":2020,"referencecount":64,"citationcount":3,"influentialcitationcount":0,"isopenaccess":true,"s2fieldsofstudy":[{"category":"Computer Science","source":"s2-fos-model"},{"category":"Computer Science","source":"external"}],"updated":"2022-03-10T06:13:39.710Z"}

But it is not available today on API. Using third party wrapper (can get raw request if needed):

sch = SemanticScholar()
corpus_ids=['CorpusId:211530585']
sch.get_papers(corpus_ids, fields=['title','year','publicationDate','citations.publicationDate',
                                                    'venue','citations.year','citations.title','citations.externalIds',
                                                    'publicationTypes','publicationVenue','externalIds','citations.venue'
                                                   ])

It is also not available via webpage search, suggesting it hasn't been reidentified. https://www.semanticscholar.org/search?q=Detecting%20Asks%20in%20Social%20Engineering%20Attacks&sort=relevance Expected Behavior

Id permanency for papers.

Actual Behavior No results

Screenshots If possible, attach screenshots or any visual aids that might clarify the issue.

Environment Details Mac

cfiorelli commented 2 months ago

It still seems to be inaccessible. Are you seeing other examples of this? usually this type of behavior is either a side product of paper clustering or sometimes we take a paper down due to some other issues.