Recover Pipeline Architecture Type

byu-dml / d3m-experimenter

A distributed system for creating, running, and persisting many machine learning experiments.

0 stars 0 forks source link

Recover Pipeline Architecture Type #58

Closed epeters3 closed 5 years ago

epeters3 commented 5 years ago

Since we are generating pipelines of multiple kinds of architectures, we need a way to be able to easily tell, given a created pipeline document, what it's pipeline architecture is. Perhaps if a basic "arch_type" field was added to each pipeline JSON document before being written to the Mongo DB, that would be enough. Possible values for the field could be one of ["straight-<k>", "ensemble-<k>", "random", "stacked-<k>"], where <k> is a relevant value e.g. "straight-5 would mean it's a straight pipeline of length 5, "ensemble-3 would mean its an ensemble of three pipelines, etc.

epeters3 commented 5 years ago

I have a branch up for this (track-arch-type). I'm just waiting on some more info from our D3M friends before I make some final changes and open a PR.

bjschoenfeld commented 5 years ago

Is this information based on how the pipeline was generated or what the pipeline looks like? For example, a randomly generated pipeline could look be straight, an ensemble, etc.

epeters3 commented 5 years ago

For simplicity's sake, I have implemented it to track data just about how it was generated. For an ensemble, I'm tracking this data:

{
    "pipeline_type": "ensemble",
    "attributes": {
        "width": 3,
        "subpipeline_length": 2
    }
}

For a randomly generated pipeline (which isn't implemented yet and won't be a part of the PR addressing this issue), we could track data like this:

{
    "pipeline_type": "random",
    "attributes": {
        "depth": 4,
        "max_width": 3,
        "max_num_inputs": 3
    }
}

epeters3 commented 5 years ago

Since these fields will be persisted in the D3M MtL DB, it would be good to make the field names more appropriate for that scope. For example maybe "pipeline_type" should be "pipeline_generation_method" or something like that, and "attributes" should be "generation_parameters" maybe.