flipkart-incubator / databuilderframework

A data driven execution engine
33 stars 29 forks source link

Data Builder FrameworkTravis build status

DataBuilder framework is a high level logic execution engine that can be used to execute multi-step workflows. This engine currently powers the checkout system as well as diagnostics and other workflows at flipkart. You should look at this framework for the following scenarios:

A few examples of the above would be:

Features

The following are the salient features:

Basics

Data Builder Framework is conceptually inspired by build systems in general and Makefiles in particular.

Terms

Before we get into the nitty-gritty details, lets go over the basic terminology:

Sample DataFlows

Image upload

Usage

The library can be used directly from maven, or from local.

Build instructions

Maven Dependency

Use the following repository:

<repository>
    <id>clojars</id>
    <name>Clojars repository</name>
    <url>https://clojars.org/repo</url>
</repository>

Use the following maven dependency:

<dependency>
    <groupId>com.flipkart.databuilderframework</groupId>
    <artifactId>databuilderframework</artifactId>
    <version>0.5.11</version>
</dependency>

Framework usage

The framework can be used in two modes:

We will go over both the modes one by one. However, the basic theory remains the same. So, let's take a look at the basics first.

Basics

The basic flow is the following:

Creating data

You need to implement the Data class if you want to explicitly name your data:

public class TestDataA extends Data {
    //Members
    public TestDataA(...) {
        super("A");
        //
    }
    //Accessors etc
}

If class name is sufficient as the name of your data you can use the DataAdapter generic class:

public class TestDataA extends DataAdapter<TestDataA> {
    // Members
    public TestDataA(...) {
        super(TestDataA.class);
        //
    }
    //Accessors etc
}

Creating a DataBuilder

DataBuilders have to implement the DataBuilder class and override the process() function:

public class TestBuilderA extends DataBuilder {

    @Override
    public Data process(DataBuilderContext context) {
        //Access already generated and/or provided data
        DataSetAccessor dataSetAccessor = context.getDataSet().accessor();
        TestDataA a = dataSetAccessor.get("A", TestDataA.class); //If name of data was A
        TestDataB b = dataSetAccessor.get(TestDataB.class); //If data was derived from DataAdapter<TestDataB>

        //Do something and Generate new data
        return new TestDataC(a.getValue() + " " + b.getValue());
    }
}

Metadata annotations

Creating an executor

There are two types of executors:

Both executors are safe and can be shared between threads. There is no need to create executors multiple times. Also, executors provide hooks for handlers that will be invoked during flow executions. See JavaDoc for details.

final DataFlowExecutor executor
                            = new MultiThreadedDataFlowExecutor(Executors.newFixedThreadPool(10));

Creating DataFlows

Data flows are created using the DataFlowBuilder. Creating a data-flow is a relatively expensive operation. A data flow object is safe and internals are immutable. As such it should be reused and shared across requests and threads.

Create data-flow with known builders

Creating a data flow where you know the participating builders is dead simple.

final DataFlow imageUploadFlow = new DataFlowBuilder()
                                .withDataBuilder(new ImageStoreSaver())
                                .withDataBuilder(new ColorExtractor())
                                .withDataBuilder(new CurveExtractor())
                                .withDataBuilder(new ExifExtractor())
                                .withDataBuilder(new PatternExtractor())
                                .withDataBuilder(new ImageValidator())
                                .withDataBuilder(new ImageIndexer())
                                .withTargetData(ImageSavedResponse.class)
                                .build();

The above will stitch the flow based on DataBuilderClassInfo annotations on the different builders.

Create flows using pre-discovered builders

Frequently your application would have a bunch of builders that expose different computations in the system. And you would not want to hardcode flows but be able to build them on the fly using APIs to generate newer workflows without deploying new code.

In this scenario you'd want to use something like Reflections library to discover such computations and register them in DataBuilderMetadataManager. Use this DataBuilderMetadataManager in the DataFlowBuilder and provide a target data; the system builds a data flow on the fly, selecting the relevant builders and connecting them properly to be able to generate the final data. In the case where multiple known builders generate the final data, a resolution spec can be specified as well while building the flow. You'd typically want to cache/store these computed data flow and execute requests.

Registering builders into DataBuilderMetadataManager

Err, well, call the register function.

DataBuilderMetadataManager dataBuilderMetadataManager = new DataBuilderMetadataManager();

//Typically the register function would be called in a loop over
//annotated DataBuilder implementations found in the classpath
dataBuilderMetadataManager
                    .register(ImageStoreSaver.class)
                    .register(ColorExtractor.class)
                    .register(CurveExtractor.class)
                    .register(ExifExtractor.class)
                    .register(PatternExtractor.class)
                    .register(ImageValidator.class)
                    .register(ImageIndexer.class)
                    .register(ImageSavedResponse.class)
                    .register(ImageResponseGenerator.class);
Build DataFlows

Build data flows by specifying the targets

final DataFlow imageUploadFlow = new DataFlowBuilder()
                                            .withMetaDataManager(dataBuilderMetaDataManager)
                                            .withName("ImageUpload")
                                            .withTargetData(ImageSavedResponse.class)
                                            .build();

final DataFlow imageGetFlow = new DataFlowBuilder()
                                            .withMetaDataManager(dataBuilderMetaDataManager)
                                            .withName("ImageGet")
                                            .withTargetData(ImageDetailsResponse.class)
                                            .build();

Execute requests

Executions can be on a single request or in checkout type scenarios in multiple requests.

Execute in single run

Get input data, execute and return the values.

public ImageSavedResponse save(final Image image) throws Exception {
    DataFlowExecutionResponse response = executor.run(imageUploadFlow, image);
    return response.get(ImageSavedResponse.class);
}

Execute across multiple requests

In this case, executions are more complex and goes like this:

Create a DataFlowInstance
DataFlowInstance instance = new DataFlowInstance("test-123", checkoutFlow.deepCopy());

Execute the flow

Example: Multi-step checkout

//Step 1: Login and address
response = executor.run(instance, userDetails);
//Step 2: Cart details and edits
response = executor.run(instance, cart);
//Step 3: Payments
response = executor.run(instance, payment);
//Done

Example: Single page checkout

response = executor.run(instance, userDetails, cart, payment);
//Done

That's it. The libary is extensively tested and documented. Go through the JavaDocs, there are a lot of details present on the same.

Version

0.5.3

Contribution, Bugs and Feedback

For bugs, questions and discussions please use the Github Issues.

If you would like to contribute code you can do so through GitHub by forking the repository and sending a pull request.

When submitting code, please make every effort to follow existing conventions and style in order to keep the code as readable as possible.

Contribution License

By contributing your code, you agree to license your contribution under the terms of the APLv2: http://www.apache.org/licenses/LICENSE-2.0

All files are released with the Apache 2.0 license.

If you are adding a new file it should have a header like this:

/**
 * Copyright 2015 Flipkart Internet Pvt. Ltd.
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 * http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

LICENSE

Copyright 2015 Flipkart Internet Pvt. Ltd.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.