Docmaps-Project / docmaps

Extensible protocol for document history metadata exchange, to enable trustworthy, rapid, open science, by and for preprint science communities.
MIT License
15 stars 1 forks source link

ts-etl: Implement generic/pluggable graph based docmap creation #84

Open ships opened 1 year ago

ships commented 1 year ago

Feature Request

Packages to improve:

Description

The ETL currently works only with crossref, and only with a relatively prescriptive linear (but recursive) constructive flow. For most pluggable ETL contexts, it will be needed to supply a generic graph interface. This has been considered low-priority at the moment while there are no consumers who are trying to implement this.

However we are already finding this to be problematic in standard Crossref use-cases where there is certain peculiar circularity in the relations. It is possible to set the current recursive logic on a bottomless search. It is needed therefore to use a graph implementation instead, with a search footprint, in these cases. As I have randomly sampled the crossref database, there appear to be certain DOI prefixes more or less prone to this behavior, and it may appear in up to 10% of cases.

Because these are legitimate documents, I think it has come time to prioritize this implementation.

Use case

  1. We should be able to use the SPA based on ETL to retrieve a docmap for 10.1130/G50960.1 .
  2. A consumer should be able to write a plugin for the ETL that creates docmaps of arbitrary complexity.

Proposed solution

The plugin API should expose a push-pop queue for BFS/DFS; your plugin needs to process elements one at a time from the queue, know whether to create a docmapsy object for it, and insert that object into a graph (assess complexity?); then optionally push more items onto the queue. The queue should automatically handle the search footprint and dupe prevention.

Additional information

Will be a big step forward on #48 .