After seeing how dat and other modern node projects work, I've realised a lot of the cognitive load of my node projects comes from trying to keep the entire architecture in my head. By making small, micro-functional packages you can keep your whole mental slate clean when tackling a problem at a particular scope.
I should iterate towards a package structure roughly like this:
getting papers
getpapers (just the cli and basic api)
getpapers-rest generalised rest-api wrapper for getpapers
abstracts out handling of paging, streaming results etc.
provides a unified query syntax
getpapers-rest-{api} where api is:
eupmc
arxiv
ieee
crossref
core
perhaps the same deal for oai-pmh
perhaps the same deal for data dumps
quickscrape stays roughly as-is
journal_scrapers <- unsure whether to keep this as a single repo or make it more modular
thresher <- needs breaking up and massively simplifying
nightmare-catcher <- use nightmare to keep a phantom instance running and keep sending scraping work to it
nightmare-dispatcher <- manage a pool of nightmares with load balancing and rate-limiting
naparazzi <- take screenshots with a nightmare
normalising metadata
hypatia - for deduplicating, merging and normalising bibliographic metadata records
some sort of universal metadata holder format (eventually) as its own package that aggregates lots of individual converters and a validator
for now, everything to bibjson (one package per converter)
library_of_alexandra - actual merged metadata library
After seeing how
dat
and other modern node projects work, I've realised a lot of the cognitive load of my node projects comes from trying to keep the entire architecture in my head. By making small, micro-functional packages you can keep your whole mental slate clean when tackling a problem at a particular scope.I should iterate towards a package structure roughly like this:
getpapers
(just the cli and basic api)getpapers-rest
generalised rest-api wrapper for getpapersgetpapers-rest-{api}
whereapi
is:quickscrape
stays roughly as-isjournal_scrapers
<- unsure whether to keep this as a single repo or make it more modularthresher
<- needs breaking up and massively simplifyingnightmare-catcher
<- use nightmare to keep a phantom instance running and keep sending scraping work to itnightmare-dispatcher
<- manage a pool of nightmares with load balancing and rate-limitingnaparazzi
<- take screenshots with a nightmarehypatia
- for deduplicating, merging and normalising bibliographic metadata recordslibrary_of_alexandra
- actual merged metadata library