The goal of this ticket is to create capabilities to download large numbers of neuroscientific papers. Ideally these papers should be in a machine readable format like text, json, html, or xml—other more complex formats like pdf should be avoided for the moment as they would entail a way more complex processing.
We may leverage any of the public APIs that exist, see e.g. UC Berkley Library's page for a list of some of those APIs.
if possible, we want to select only papers related to a specific topic (e.g. "neuroscience") so that we don't ingest in the database lots of GB of material we don't really need
if possible, we must try to download full-texts and not just title+abstract+metadata.
The goal of this ticket is to create capabilities to download large numbers of neuroscientific papers. Ideally these papers should be in a machine readable format like text, json, html, or xml—other more complex formats like pdf should be avoided for the moment as they would entail a way more complex processing.
We may leverage any of the public APIs that exist, see e.g. UC Berkley Library's page for a list of some of those APIs.