adsabs / adsabs-pyingest

MIT License
2 stars 9 forks source link

Modify PNAS Parser / Harvester #158

Closed seasidesparrow closed 2 years ago

seasidesparrow commented 3 years ago

We currently harvest based on specific feeds, but we would like to harvest everything that comes out of PNAS.

For current content, we should be harvesting from https://www.pnas.org/rss/recent.xml

For back content, there doesn't appear to be a feed, just a web page: https://www.pnas.org/content/by/year

seasidesparrow commented 3 years ago

We also need to completely disconnect harvesting and parsing. The parser currently fetches content on its own, but it should be passed raw data/text from an external harvester the way that other parsers are.

seasidesparrow commented 2 years ago

Done, see https://github.com/adsabs/adsabs-pyingest/pull/166