Node / electronic scientific literature package infrastructure

After seeing how dat and other modern node projects work, I've realised a lot of the cognitive load of my node projects comes from trying to keep the entire architecture in my head. By making small, micro-functional packages you can keep your whole mental slate clean when tackling a problem at a particular scope.

I should iterate towards a package structure roughly like this:

getting papers
- getpapers (just the cli and basic api)
- getpapers-rest generalised rest-api wrapper for getpapers
- abstracts out handling of paging, streaming results etc.
- provides a unified query syntax
- getpapers-rest-{api} where api is:
- eupmc
- arxiv
- ieee
- crossref
- core
- perhaps the same deal for oai-pmh
- perhaps the same deal for data dumps
- quickscrape stays roughly as-is
- journal_scrapers <- unsure whether to keep this as a single repo or make it more modular
- thresher <- needs breaking up and massively simplifying
- nightmare-catcher <- use nightmare to keep a phantom instance running and keep sending scraping work to it
- nightmare-dispatcher <- manage a pool of nightmares with load balancing and rate-limiting
- naparazzi <- take screenshots with a nightmare
normalising metadata
- hypatia - for deduplicating, merging and normalising bibliographic metadata records
- some sort of universal metadata holder format (eventually) as its own package that aggregates lots of individual converters and a validator
- for now, everything to bibjson (one package per converter)
- library_of_alexandra - actual merged metadata library

blahah / mozilla_science_fellowship

Node / electronic scientific literature package infrastructure #30