SolidLabResearch / Challenges

24 stars 0 forks source link

An orchestrator for ease of workflow management #47

Closed s-minoo closed 1 year ago

s-minoo commented 2 years ago

This challenge has been split into 3 separate challenges: #50 #51 #52

Pitch

Undoubtedly, data will flow from pod to pod in the Solid ecosystem. Applications can create ad-hoc solutions to fetch and transfer data from one pod to another, however, interoperable orchestration of those data flows increases scalability of the solution. Think for example of a workflow that extracts Strava data from the Strava API, maps it to RML using the RMLStreamer as an LDES data stream, and then bucketizes that stream to, for example, create aggregated statistics of how many runs you did last week, how much kilometers, etc. etc. Without an orchestration component, this flow will need to be re-implemented for different use cases, again and again. An implementation-independent interoperable solution is needed.

Existing frameworks for workflow management, such as NiFi, Oozie, Airflow, and Dagster restricts the users within the context of the frameworks, be it in terms of programming language, limited API extensibility or fixed orchestration mechanism. On the other hand, DSL based workflow management tools such as Toil and Snakemake are limited in the tasks that they support which includes only BASH scripts.

Nextflow solved the aforementioned problems of the workflow management systems, however, it only supports file-based channels for data transfer. It cannot set up a workflow with processors using arbitrary channels such as Kafka for data transfer.

The aforementioned tools also suffer from the lack of semi-automatic generation of a workflow plan and require the user to explicitly define the workflow plan. Therefore, a generic and modular orchestrator to manage not only workflow but also the orchestration of different micro-services/app will be beneficial in the context of Solid, for example, setting up and orchestration of the different components needed for LDES generation. Furthermore, this would enable a strong foundation to a more modular data processing workflow architecture without reliance on existing tech stack on data processing.

Desired solution

Acceptance criteria

Precondition

Demonstrator

In the context of workflow setups, developers need to connect different individual components with each other to compose the workflow. For example, in the to generate LDES data from existing heterogeneous data sources, a typical workflow could look something like this:

  1. Data fetching from data source
  2. Mapping fetched data to RDF quads
  3. Feeding RDF quads to a LDES server

The developer runs the orchestrator with the provided config files for processors and channels to generate a workflow plan. The workflow plan could then be executed by the orchestrator, or tuned manually if desired before executing it with the orchestrator.

The orchestrator could also start the necessary services such as Kafka brokers and also gracefully stop the running processors in the workflow.

Pointers

Scenarios

pheyvaer commented 2 years ago

Hi @s-minoo

Great idee! Because of the different things that are described here I think that this better described as a scenario and that separate/smaller challenges are extracted from this scenario.

s-minoo commented 2 years ago

Should I then split this into 3 separate challenges?

  1. A spec/ontology to describe the workflows used by the orchestrator
  2. A spec for the configuration of the processors using the ontology in step 1
  3. A CLI orchestrator tool that uses the configurations to execute the pipeline
pheyvaer commented 2 years ago

Yes that would indeed be a good start! We can always refine, adjust, add more challenges as work is done.

RubenVerborgh commented 2 years ago

Will need to be applied to a use case, so the task can be finished.

pheyvaer commented 1 year ago

@s-minoo Did you have the chance to look into making the necessary changes?

s-minoo commented 1 year ago

This challenge has been split up into 3 smaller challenges #50 #51 #52. Is it okay if I just refer to them?

RubenVerborgh commented 1 year ago

That's okay! We can update the description and/or close this one then.

pheyvaer commented 1 year ago

@s-minoo Can you either close this one or update its description? Thanks!

s-minoo commented 1 year ago

Edited and I'll close this too!