The ETL currently works only with crossref, and only with a relatively prescriptive linear (but recursive) constructive flow. For most pluggable ETL contexts, it will be needed to supply a generic graph interface. This has been considered low-priority at the moment while there are no consumers who are trying to implement this.
However we are already finding this to be problematic in standard Crossref use-cases where there is certain peculiar circularity in the relations. It is possible to set the current recursive logic on a bottomless search. It is needed therefore to use a graph implementation instead, with a search footprint, in these cases. As I have randomly sampled the crossref database, there appear to be certain DOI prefixes more or less prone to this behavior, and it may appear in up to 10% of cases.
Because these are legitimate documents, I think it has come time to prioritize this implementation.
Use case
We should be able to use the SPA based on ETL to retrieve a docmap for 10.1130/G50960.1 .
A consumer should be able to write a plugin for the ETL that creates docmaps of arbitrary complexity.
Proposed solution
The plugin API should expose a push-pop queue for BFS/DFS; your plugin needs to process elements one at a time from the queue, know whether to create a docmapsy object for it, and insert that object into a graph (assess complexity?); then optionally push more items onto the queue. The queue should automatically handle the search footprint and dupe prevention.
Feature Request
Packages to improve:
Description
The ETL currently works only with crossref, and only with a relatively prescriptive linear (but recursive) constructive flow. For most pluggable ETL contexts, it will be needed to supply a generic graph interface. This has been considered low-priority at the moment while there are no consumers who are trying to implement this.
However we are already finding this to be problematic in standard Crossref use-cases where there is certain peculiar circularity in the relations. It is possible to set the current recursive logic on a bottomless search. It is needed therefore to use a graph implementation instead, with a search footprint, in these cases. As I have randomly sampled the crossref database, there appear to be certain DOI prefixes more or less prone to this behavior, and it may appear in up to 10% of cases.
Because these are legitimate documents, I think it has come time to prioritize this implementation.
Use case
Proposed solution
The plugin API should expose a push-pop queue for BFS/DFS; your plugin needs to process elements one at a time from the queue, know whether to create a docmapsy object for it, and insert that object into a graph (assess complexity?); then optionally push more items onto the queue. The queue should automatically handle the search footprint and dupe prevention.
Additional information
Will be a big step forward on #48 .