Closed RobotPsychologist closed 7 months ago
Outstanding action item / notes for me from connecting with RK on this User asking for bulk author search. Thoughts/timeline?
Just to clarify, is there currently a way to find all the papers from a given author? Or is that what is under development?
@ericchagnon15 (+ @RobotPsychologist upon reread maybe this solves your ask too?)
I think a great solution might be leveraging the datasets. I've set bold text on the papers and authors items to show what im referring to.
Latest Release ID: 2023-10-24
Available datasets in the latest release:
- Name: authors Description: The core attributes of an author (name, affiliation, paper count, etc.). Authors have an "authorId" field, which can be joined to the "authorId" field of the members of a paper's "authors" field. 75M records in 30 100MB files.
Name: citations Description: Instances where the bibliography of one paper (the "citingPaper") mentions another paper (the "citedPaper"), where both papers are identified by the "paperId" field. Citations have attributes of their own, (influential classification, intent classification, and citation context). 2.4B records in 30 8.5GB files.
Name: embeddings-specter_v1 Description: A dense vector embedding representing the contents of the paper. 120M records in 30 28GB files.
Name: embeddings-specter_v2 Description: A dense vector embedding representing the contents of the paper, generated with SPECTER2 120M records in 30 28GB files.
Name: paper-ids Description: Mapping from sha-based ID to paper corpus ID. 450M records in 30 500MB files
- Name: papers Description: The core attributes of a paper (title, authors, date, etc.). 200M records in 30 1.5GB files.
Name: publication-venues Description: Details about the venues in which papers are published.
Name: s2orc Description: Full-body paper text parsed from open-access PDFs. Identifies structural elements such as paragraphs, sections, and bibliography entries. 10M records in 30 4GB files.
Name: tldrs Description: A short natural-language summary of the contents of a paper. 58M records in 30 200MB files.
Is your feature request related to a problem? Please describe. Not exactly a problem, but I am interested in studying authorship patterns across fields. Which authors write in multiple fields, publication frequency, filtering on h-index and citation count. A way to bulk download authors based on a condition would allow me to then search papers by the most prominent authors by year, by field, etc.
Describe the solution you'd like Essentially the same thing as Paper Bulk Search.
Describe alternatives you've considered Right now I'm just focusing on the analysis of papers by field, I've considered papers bulk searches, and inferring the most influential authors from the most influential papers, but this is likely to miss some.
Additional context N/A