byu-dml / d3m-experimenter

A distributed system for creating, running, and persisting many machine learning experiments.
0 stars 0 forks source link

High level design should be updated #41

Open orionw opened 5 years ago

orionw commented 5 years ago

The experimenter can perform a number of functions, each with a variety of sub functions. We should consider breaking this out so that it can be more easily run and unittested.

We have:

  1. Pipeline Generation
    • Regular pipelines
    • Ensemble pipelines
      • Combinations of preprocessors
      • Combinations of models
      • Combinations of how many straight pipelines to ensemble
    • Metafeature pipeline for gathering metafeatures
  2. Gather problems and datasets
    • Connecting to Mongo
    • D3M validation of schema
  3. Executing Pipelines
    • Which problems, which pipelines?
      • Which order to run
    • Locally
    • Distributed
      • Which priority
  4. Having workers join the distributed system
    • Adding / deleting / reassigning workers
    • debugging failures on the queue/ monitoring

These should probably be more modular. They are all currently under one command line option that is too complex and lacks options for generating ensemble pipelines. The only decent area I feel is the mongo connection class.

Ideally, breaking functionality apart would make this easier to test and thus, easier to develop and make sure nothing is breaking.

epeters3 commented 4 years ago

A lot of work has been done on this front, but there is still some work that could be done (probably not high priority for the lab right now but making note here just to document things):

What's been done:

What could still be done: