edge-ml / ml

machine learning component
1 stars 0 forks source link

Gameplan #1

Closed TobiasRoeddiger closed 2 years ago

TobiasRoeddiger commented 2 years ago

We want to have a first full pipeline as soon as possible. Therefore, we need to come up with a strategy that can efficiently and iterativley contribute towards that goal. Please feel free to share your thoughts and suggestions below.

Next Sprint (~ 8 weeks)

Next Next Sprint (~ 16 weeks)

Next Next Next Sprint (~ 24 weeks)

Other Stuff

TobiasRoeddiger commented 2 years ago

NOTE

Currently this does not consider what we already have implemented here. I think we can learn from it and get some inspiration four our architecture.

Architecture Thoughts

Features we need:

Internal Structure

abstract class EdgeModel

+ abstract get_hyperparameters();
+ abstract fit(X, y, hyperparameters);
+ abstract predict(X);
+ abstract compileFirmware(targetPlatform); // generates the binary for the target platform

- window_data(X, y, width, stride); // will be called before fit, returns tuple of data and labels
class RandomForest extends EdgeModel

+ get_hyperparameters(); // call super and add own hyperparameters
+ fit(X, y, hyperparameters);
+ predict(X);
+ compileFirmware(targetPlatform);
{
  param1: {
  param2: {
    type: 'selection',
    options: ['full', 'deep', 'half'],
    required: true, // this parameters has to be selected
    multiSelect: false // determines if multiple items can be selected
  },
  param3: {
    type: "number", // select value by typing a number
    min: 0,
    max: 5,
    inclusiveMin: true, // can min and max be selected
    inclusiveMax: false,
    precision: 'int',  // float also possible,
    required: false
  },
  param4: {
    type: "boolean", // can only be true or false,
    required: false
  }
}

REST API

GET - /model

returns the list of available models that can be trained

Each model has the following information:

{
  id: "sdfasdfasf34f334f34f",
  name: "RandomForest",
  hyperparameters: META_JSON_FORMATTED_HYPERPARAMETERS, // can be parsed by the frontend to show configuration UI
  isPro: false, // determines if the model can only be used by Pro users
}

POST - /model/:id

trains the model based on the given parameters

accepts

{
  project: "sdfoi3hf09whf0923hd", // project id for which the model is created
  datasets: ["sadfsdf4f34f", "sdaf34g45g9j45g", ....], // list of datasets ids to use for model training and testing
  labels: ["4456432f44v45ff", "34f234f425fmm4"], // label ids to consider for the classification task
  hyperparameters: { ... }, // as specified from the meta format
}

returns

200

GET - /model/trained

returns the list of available models that were trained

list of trained models

GET - /model/trained/:id

returns a specific model that was trained

{
  modelId: "sdfasdfasf34f334f34f", // id of the model used for training
  hyperparameters: { ... }, // as specified before training
  confusionMatrix: { ... }, // the confusion matrix of all classes
  size: 140.3, // the size mof the model in kB
}
riedel commented 2 years ago

I talked to @riedel about this and we are thinking about running a Hackathon to build an initial feature extraction library that we can use on Arduino and also call from python on edge-ml. This way we could retain the logic accross platforms.

One more idea for discussion: We could maybe even make this hackathon open to externals. I would actually make the hackathon with a broader scope. Including also automl, etc.

TobiasRoeddiger commented 2 years ago

I talked to @riedel about this and we are thinking about running a Hackathon to build an initial feature extraction library that we can use on Arduino and also call from python on edge-ml. This way we could retain the logic accross platforms.

One more idea for discussion: We could maybe even make this hackathon open to externals. I would actually make the hackathon with a broader scope. Including also automl, etc.

Yes. That would be really nice once we have the meta architecture ready.

riedel commented 2 years ago

Yes. That would be really nice once we have the meta architecture ready.

Maybe it makes sense to do this in a 2 step aproach then:

  1. internal hackathon: with prototypes without fixed meta arch
  2. broader hackathon taking "architecturized" examples from 1st hackathon as basis and examples
TobiasRoeddiger commented 2 years ago

Yes good idea, the python library to pull data is almost ready. For the internal hackathon we would have different people build different models from the data that they pull using the python library. This way we could better understand the requirements for the meta arch and could also validate that collecting data and labeling works as expected.

How relevant will porting to the edge devic be for the hackathon?

TobiasRoeddiger commented 2 years ago

Latest update:

@KtrauM maybe we can add a sensor filter to the notebook, too? This would avoid some bugs (e.g., at the start define ACC_x etc. as target sensors) Otherwise what happens if we have a project with different sensors in some datasets? Then we could just drop useless datasets that don't have all sensors and ignore useless "non-target" sensor streams.