Closed seasidesparrow closed 2 years ago
We currently harvest based on specific feeds, but we would like to harvest everything that comes out of PNAS.
For current content, we should be harvesting from https://www.pnas.org/rss/recent.xml
For back content, there doesn't appear to be a feed, just a web page: https://www.pnas.org/content/by/year
We also need to completely disconnect harvesting and parsing. The parser currently fetches content on its own, but it should be passed raw data/text from an external harvester the way that other parsers are.
Done, see https://github.com/adsabs/adsabs-pyingest/pull/166
We currently harvest based on specific feeds, but we would like to harvest everything that comes out of PNAS.
For current content, we should be harvesting from https://www.pnas.org/rss/recent.xml
For back content, there doesn't appear to be a feed, just a web page: https://www.pnas.org/content/by/year