VirtualFlyBrain / vfb-pipeline-collectdata

Apache License 2.0
0 stars 0 forks source link

Allow configuration to support data staging #8

Closed dosumis closed 3 years ago

dosumis commented 3 years ago

We originally discussed supporting a parallel staging pipeline

https://github.com/VirtualFlyBrain/neo4j2owl/issues/18#issuecomment-651678948

This doesn't seem to have happened, but looks like it would be relatively straightforward to lightly edit the SPARQL used for filtering our embargoed data: https://github.com/VirtualFlyBrain/vfb-pipeline-collectdata/tree/master/sparql

We would need a config for the parallel, staging pipeline that would allow through DataSets where production: False, staging: True

CC @Robbie1977 - please check my spec here.

matentzn commented 3 years ago

I can start working on this by the end of the week, but the more work intensive business will be to set up the parallel pipeline physically (setting up a parallel triple store, pdb, owlery etc). Maybe the first step would be to actually mirror the existing pipeline physically on Jenkins (vfb-pipeline2-devstage) and then start working with config to allow unpublished data to seep through?

Robbie1977 commented 3 years ago

I will setup a dual pipeline - is the whole thing needed or simply the dump stage and beyond?

matentzn commented 3 years ago

Everything except for KB is needed unfortunately, because the embargoeing happens pre-triplestore..

matentzn commented 3 years ago

Fixed in https://github.com/VirtualFlyBrain/vfb-pipeline-collectdata/pull/9