Closed epeters3 closed 5 years ago
I have a branch up for this (track-arch-type
). I'm just waiting on some more info from our D3M friends before I make some final changes and open a PR.
Is this information based on how the pipeline was generated or what the pipeline looks like? For example, a randomly generated pipeline could look be straight, an ensemble, etc.
For simplicity's sake, I have implemented it to track data just about how it was generated. For an ensemble, I'm tracking this data:
{
"pipeline_type": "ensemble",
"attributes": {
"width": 3,
"subpipeline_length": 2
}
}
For a randomly generated pipeline (which isn't implemented yet and won't be a part of the PR addressing this issue), we could track data like this:
{
"pipeline_type": "random",
"attributes": {
"depth": 4,
"max_width": 3,
"max_num_inputs": 3
}
}
Since these fields will be persisted in the D3M MtL DB, it would be good to make the field names more appropriate for that scope. For example maybe "pipeline_type"
should be "pipeline_generation_method"
or something like that, and "attributes"
should be "generation_parameters"
maybe.
Since we are generating pipelines of multiple kinds of architectures, we need a way to be able to easily tell, given a created pipeline document, what it's pipeline architecture is. Perhaps if a basic
"arch_type"
field was added to each pipeline JSON document before being written to the Mongo DB, that would be enough. Possible values for the field could be one of["straight-<k>", "ensemble-<k>", "random", "stacked-<k>"]
, where<k>
is a relevant value e.g."straight-5
would mean it's a straight pipeline of length 5,"ensemble-3
would mean its an ensemble of three pipelines, etc.