Closed cschloer closed 4 years ago
Hi @akariv, could you please take a look?
@cschloer I'm putting it on hold from my side for now because:
laminar-server
PS. Just curious whether there is a possibility to create a bridge between DPP and dataflows and run dataflows under the hood instead of DPP. It would feel more natural when we need to get a response (data/metadata) programmatically
I think it would be great to transition to only using dataflows. Most of our custom processors are still written in the old DPP structure but it would be worth it to transition them entirely to dataflows if it meant getting access to stuff programmatically. My only concern is still having the structure of pipeline-spec.yaml --> some open source, widely available program --> processed data
. It would be theoretically be easy to write something that parses a pipeline-spec.yaml and runs all of the relevant dataflows, but it's also important that it be easy to download and easy to run on the command line (which was why dpp was so great)
Yea I think an ability to run DPP steps in dataflows/dataflows-bridge would be great
Is there any way to expose the DPP functionality to python? I played around a bunch with importing the functions here but in the end the only thing I was really able to do was save a pipeline-spec.yaml to the file system and open a subprocess to run DPP.
It would be great if:
To elaborate a bit on #3 - currently I am programmatically adding a dump_to_path step at the very end, waiting for the pipeline to finish, and then reading the file that was dumped. It would be much better if the results of the pipeline (AKA all of the rows and the datapackage) were just returned by the function call.