alibaba / pipcook

Machine learning platform for Web developers
https://alibaba.github.io/pipcook/
Apache License 2.0
2.55k stars 209 forks source link

Introducing a way to create pipeline dynamically #759

Open yorkie opened 3 years ago

yorkie commented 3 years ago

To improve the user experience to develop a pipeline, I'd like to propose a way to make a pipeline script for better ML development.

// load pipcook core APIs.
import pipcook from '@pipcook/core';

// load scripts
import createDataset from 'https://cdn.jsdelivr.net/gh/imgcook/pipcook-plugin-image-classification-collector@d00337c/build/script.js';
import createModel from 'https://cdn.jsdelivr.net/gh/imgcook/pipcook-plugin-tfjs-mobilenet-model@a95d0de/build/script.js';
import updateImageMetadata from 'https://cdn.jsdelivr.net/gh/imgcook/pipcook-plugin-image-metadata@db14a1a/build/script.js';
import resizeImage from 'https://cdn.jsdelivr.net/gh/imgcook/pipcook-plugin-process-tfjs-image-classification@db14a1a/build/script.js';
import greyImage from 'https://cdn.jsdelivr.net/gh/imgcook/pipcook-plugin-grey-image@foobar/build/script.js';

const pipeline = new pipcook.Pipeline('mobilenet@1.0.0', {
  // pipeline options
  train: {
    epochs: 20,
    validationRequired: true
  }
});

// create plugins
const wasmify = new pipcook.Plugin('pipcook-tvm-wasmify@0.0.1');
const dest = new pipcook.Plugin('pipcook-artifact-zip@0.0.2', { target: '/tmp/mobilenet-model.zip' });
// load dataset...
const dataset = await createDataset('https://ai-sample.oss-cn-hangzhou.aliyuncs.com/image_classification/datasets/imageclass-test.zip');

// process on the whole dataset
await updateImageMetadata(dataset);
// process on every batch
resizeImage(dataset, { size: 224 });
greyImage(dataset);

// here we can create the model dynamically
const mobilenet = await createModel(dataset);
// operate on mobilenet, for example modifying the layers and ops

// create model convertors
pipeline.once('finished', () => {
  console.log('train is finished');
});
// start running the pipeline.
pipeline
  .pipe(wasmify)
  .pipe(dest)
  .run();
yorkie commented 3 years ago

Then we are able to use RxJS or streaming APIs with dataflow scripts to describe DAGs for more complex flow.