epeters3 commented 4 years ago

Closes #51.

When reviewing this PR, I would recommend looking at the changes for each commit one at a time.

Allows EZPipeline instance to track custom refs
Adds more working pre-processors and classifiers and removes not working ones
Fixes bugs related to changing default hyperparameters
Changes data's flow through pipeline primitives
Allows mapping pipeline arguments to custom data reference
Unifies argument names and adds "subscribe to CLI args" paradigm
Adds CLI arguments used by random experiment
Adds random experiment

codecov-io commented 4 years ago

Codecov Report

Merging #67 into develop will decrease coverage by 4.57%. The diff coverage is 84.61%.

@@             Coverage Diff             @@
##           develop      #67      +/-   ##
===========================================
- Coverage     70.3%   65.72%   -4.58%     
===========================================
  Files           22       27       +5     
  Lines         1266     1701     +435     
===========================================
+ Hits           890     1118     +228     
- Misses         376      583     +207

Impacted Files	Coverage Δ
experimenter/experiments/straight.py	`70.45% <ø> (ø)`	:arrow_up:
experimenter/experiments/ensemble.py	`79.33% <ø> (ø)`	:arrow_up:
test/test_ensembling_pipelines_generation.py	`96.49% <ø> (-0.53%)`	:arrow_down:
test/test_pipeline_builder.py	`96.96% <100%> (-0.05%)`	:arrow_down:
experimenter/experiments/experiment.py	`80% <100%> (ø)`	:arrow_up:
experimenter/experiments/stacked.py	`75% <100%> (ø)`	:arrow_up:
experimenter/pipeline_builder.py	`98.03% <100%> (+2.84%)`	:arrow_up:
test/test_random_pipelines.py	`100% <100%> (ø)`
experimenter/experiments/metafeatures.py	`45.45% <100%> (ø)`	:arrow_up:
test/test_cli.py	`100% <100%> (ø)`
... and 10 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 4669e93...3aebaec. Read the comment docs.

epeters3 commented 4 years ago

It looks like the above Travis CI Branch build failed because of a test that fails sometimes with low probability, even though the test still works. So I think we can consider the builds passing for this PR.

epeters3 commented 4 years ago

I’m so glad it was readable for you, thank you so much for reviewing it, I’m sorry it was so large.

On Wed, Oct 2, 2019 at 5:01 PM orionw notifications@github.com wrote:

@orionw approved this pull request.

Very readable code! Looks great - as does the command line args. They're starting to get large so I think this is a good way to handle it.

In experimenter/experiments/random.py https://github.com/byu-dml/d3m-experimenter/pull/67#discussion_r330806645 :

num_inputs = random.randint(1, max_num_inputs)

There are only len(output_collection) possible inputs available.

num_inputs = min(len(output_collection), num_inputs)

input_refs: Set[str] = set(random.sample(output_collection, num_inputs))

Since the steps identified by input_refs are now being used

as inputs, we know they're not terminal nodes anymore.

terminal_node_data_refs -= input_refs

concat_result_ref = self._concatenate_inputs(structure, input_refs, concat_cache)

Add the do nothing primitive as a placeholder. self._sample_pipeline will fill

in the primitives later.

add_pipeline_step(

structure,

'd3m.primitives.data_preprocessing.do_nothing.DSBOX',

concat_result_ref

I like the no_nothing primitive here!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/byu-dml/d3m-experimenter/pull/67?email_source=notifications&email_token=AFBLEZB3AJGDFJ476FDIAWLQMUR4PA5CNFSM4I44HEBKYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCGWVBLI#pullrequestreview-296571053, or mute the thread https://github.com/notifications/unsubscribe-auth/AFBLEZA7AZLZZDH2UJOC7BDQMUR4PANCNFSM4I44HEBA .

bjschoenfeld commented 4 years ago

It looks like the above Travis CI Branch build failed because of a test that fails sometimes with low probability, even though the test still works. So I think we can consider the builds passing for this PR.

Can we make the test deterministic, so that it passes every time?

orionw commented 4 years ago

@bjschoenfeld it was supposed to test randomness - we have other tests that do it without the slight probability of failure. I'm for just taking the test out - the other tests cover it already. This test was supposed to be non-deterministic, back when I made it a long time ago.

byu-dml / d3m-experimenter

Add Random Experiment #67

Codecov Report

Very readable code! Looks great - as does the command line args. They're starting to get large so I think this is a good way to handle it.

There are only `len(output_collection)` possible inputs available.

Since the steps identified by `input_refs` are now being used

as inputs, we know they're not terminal nodes anymore.

Add the do nothing primitive as a placeholder. `self._sample_pipeline` will fill

in the primitives later.

byu-dml / d3m-experimenter

Add Random Experiment #67

Codecov Report

Very readable code! Looks great - as does the command line args. They're starting to get large so I think this is a good way to handle it.

There are only len(output_collection) possible inputs available.

Since the steps identified by input_refs are now being used

as inputs, we know they're not terminal nodes anymore.

Add the do nothing primitive as a placeholder. self._sample_pipeline will fill

in the primitives later.

There are only `len(output_collection)` possible inputs available.

Since the steps identified by `input_refs` are now being used

Add the do nothing primitive as a placeholder. `self._sample_pipeline` will fill