Configurable metadata aggregator and crosswalk for NYU Libraries collections designed to populate Primo. Can run as a web server and dynamically update document cache.
> docker build -t primo-endpoint .
> docker run -p 80 primo-endpoint
Logs to stdout by default. Startup can be optimized by persisting the /cache volume.
> curl -sSL https://get.haskellstack.org/ | sh
> stack install
Usage: primo-endpoint [OPTION...]
-c FILE --config=FILE Load configuration from FILE [config.yml]
-a FILE --auth=FILE Load auth rules from FILE [auth.yml]
-C DIR --cache=DIR Use DIR for cache files [$XDR_CACHE_DIR/primo-endpoint]
-f --force Force an initial update of all collections
-o[DEST] --output[=DEST] Write JSON output to file [-]
-w[PORT] --web-server[=PORT] Run a web server on PORT [80] to serve the result
-l --log-access Log access to stdout
-v --verbose Log collection refreshes to stdout
The configuration is read from a YAML (or JSON) file with the following structure:
interval
: number of seconds for which to cache collections before reloading (by default)fda
: FDA-specific configuration options:
collections
: maximum number of collections to load from index to use in translating hdl
s to id
sgenerators
: a set of named generator "macro" functions that can be used as generator keys, substituting passed object arguments for input fieldstemplates
: a set of named field generator templates, each of which contains a set of field generatorscollections
: a set of named collections, each with the following fields:
source
: a source type (see below), which may also take additional arguments on the collection objecttemplate
: optional string or array of 0 or more templates (referencing names in the templates
object), which are all unioned togetherfields
: additional local "custom" generator fields for this collectionSee config.yml
for an example.
Each collection can have one of the following source values to specify the endpoint to pull from:
https://archive.nyu.edu/rest/collections/$id
requires id
(internal) or hdl
(suffix)http://discovery.dlib.nyu.edu:8080/solr3_discovery/$core/select
requires core
(core
(none), viewer
, or nyupress
) and code
(collection code)http://dlib.nyu.edu/$path
requires path
https://geo.nyu.edu/catalog
(filtered on dct_provenance_s=NYU
)https://specialcollections.library.nyu.edu/search/catalog.json
requires filters
object mapping field to valuehttp://isaw.nyu.edu/publications/awol-index/awol-index-json.zip
(filtered on is_part_of=null
)file
or url
; mainly for testing purposesField definitions are made up of the following:
field
: name of source field to copystring
: string literal to create single valuepaste
: list of definitions, or string with $field
or ${field}
placeholders to substitute ($$
for a literal $
); the resulting strings are pasted together (no delimiter) as a cross-product (so the number of resulting values is the product of the number of values from each element)handle
: definition. Convert a string of the form "http://hdl.handle.net/XXX/YYY.ZZZ" to "hdl-handle-net-XXX-YYY-ZZZ". Any non-matching input is discarded.value
: any definition (for convenient nesting)date
: string strptime format. Tries to parse each value in the result with the given format and produces a timestamp in standard format (relevant prefix of "%Y-%m-%dT%H:%M:%S%QZ") as output. Any inputs that cannot be parsed are discarded.match
: match input against regular expressions\'
(apostrophe): the input string after the (first) match&
: the matching segment of the input string0
: same as &
1
...N
: the string matched by each parenthesized group in the regular expressionlimit
: integer. Take only the first n values from the input, discarding the rest.default
: definition. If there are no produced input values, provide the definition instead.join
: string literal delimiter. Paste all the inputs together, separated by the given delimiter. Always produces exactly one output..
, _
, and alphanumerics: passed to field
paste
There are two special input fields added to every source document:
_key
: The collection key_name
: The collection name fieldMapping to NYUCore for that collection:
identifier: ["identifier.uri","identifier.citation"] title: ["title" ] creator: ["contributor.author" ] description: ["description" ] date: ["date.issued" ] publisher: ["publisher.place:publisher,date.issued"] format: ["format" ] rights: ["rights" ] subject: ["subject" ] relation: ["identifier.citation"]