Docmaps-Project / docmaps

Extensible protocol for document history metadata exchange, to enable trustworthy, rapid, open science, by and for preprint science communities.
MIT License
15 stars 1 forks source link

Add etl cli #42

Closed ships closed 1 year ago

ships commented 1 year ago

Description

This PR adds a CLI tool in typescript that enables creation of docmaps based on an external source of similar data. In this MVP, it will infer docmapsy information from the Crossref API and make several assumptions. The goal will be to extract the upstream adapter as a pluggable component so this CLI becomes generic.

The basic usage can be discovered with pnpm start help. It is something like this:

pnpm start item --source crossref-api 10.5194/angeo-40-247-2022

Recall that a Docmap's core datapoints are a collection interconnected steps. Additional metadata is also included. However the docmap does not explicitly have a "subject" that is a DOI -- this can sometimes be inferred from its ID or its steps.

The CLI follows a basic recursive routine: it creates a Step for the identified DOI, and if it has any review articles referring to it, an additional step is included after the main step. Further, if there is a Preprint for the identified DOI, it will recursively invoke this routine and prepend the result to the step list. Once all the Steps are identified they are wired together using next-step and previous-step in a slightly hacky way to be fed into a Docmap.

Related Issues

37 - Crossref-to-Docmaps

24 - Example using fp-ts to parse

Checklist

Additional Information

Provide any additional information that might be helpful in understanding this pull request, such as screenshots, links to relevant research, or other context.

3mcd commented 1 year ago

Looks great to me! Tests look comprehensive enough and I feel I have a good grasp for how it works after your walkthrough. And thank you for adding those comments!