dvdmjohnson / d3m_michigan_primitives

Contains primitives developed by the University of Michigan team as part of the Data Driven Discovery (D3M) project
Other
2 stars 0 forks source link

Evaluate using private dataset repo #34

Closed rszeto closed 4 years ago

rszeto commented 4 years ago

Our current pipelines use the public dataset repo, whereas we should use the private one.

This involves changing the Docker mount to point to the appropriate cloned repository on the host and handling everything that breaks as a result.

rszeto commented 4 years ago

The updated datasets do not seem to support semantic types. Check this by running the following in the private-datasets branch:

root@f1e4702c7a4a:/spider# eval `cat run_pipeline_cmds.txt | grep GRASTAAutoMPGPipeline`
WARNING:d3m.metadata.pipeline_run:'worker_id' was generated using the MAC address inside Docker container and is not a reliable compute resource identifier.
WARNING:d3m.metadata.pipeline_run:Configuration environment variable not set: D3MCPU
WARNING:d3m.metadata.pipeline_run:Configuration environment variable not set: D3MRAM
WARNING:d3m.metadata.pipeline_run:Docker image environment variable not set: D3M_BASE_IMAGE_NAME
WARNING:d3m.metadata.pipeline_run:Docker image environment variable not set: D3M_BASE_IMAGE_DIGEST
WARNING:d3m.metadata.pipeline_run:Docker image environment variable not set: D3M_IMAGE_NAME
WARNING:d3m.metadata.pipeline_run:Docker image environment variable not set: D3M_IMAGE_DIGEST
WARNING:d3m.metadata.pipeline:Digest for primitive 'fe20ef05-7eaf-428b-934f-4de0b8011ed2' does not match the one specified in the primitive description. Primitive description digest: 4212cc0a34c7083e8fa3b811674fc40fcff724aafd2a2e21b99a8a53de0d9fc7. Resolved primitive digest: 9ec1103dfcc9b16ff61b02ca8bc3f812742684a060f4c85cb148ff5a654f8d1a.
Traceback (most recent call last):
  File "/src/d3m/d3m/runtime.py", line 941, in _do_run_step
    self._run_step(step)
  File "/src/d3m/d3m/runtime.py", line 931, in _run_step
    self._run_primitive(step)
  File "/src/d3m/d3m/runtime.py", line 839, in _run_primitive
    multi_call_result = self._call_primitive_method(primitive.fit_multi_produce, fit_multi_produce_arguments)
  File "/src/d3m/d3m/runtime.py", line 914, in _call_primitive_method
    raise error
  File "/src/d3m/d3m/runtime.py", line 910, in _call_primitive_method
    result = method(**arguments)
  File "/src/d3m/d3m/primitive_interfaces/transformer.py", line 72, in fit_multi_produce
    return self._fit_multi_produce(produce_methods=produce_methods, timeout=timeout, iterations=iterations, inputs=inputs)
  File "/src/d3m/d3m/primitive_interfaces/base.py", line 565, in _fit_multi_produce
    produce_results = self._multi_produce(produce_methods=produce_methods, timeout=timeout, iterations=iterations, **kwargs)
  File "/src/d3m/d3m/primitive_interfaces/base.py", line 461, in _multi_produce
    results.append(getattr(self, method_name)(timeout=timeout, iterations=iterations, **arguments))
  File "/spider/src/common-primitives/common_primitives/extract_columns_semantic_types.py", line 98, in produce
    columns_to_use = self._get_columns(inputs.metadata)
  File "/spider/src/common-primitives/common_primitives/extract_columns_semantic_types.py", line 131, in _get_columns
    semantic_types=self.hyperparams['semantic_types'],
ValueError: Input data has no columns matching semantic types: ('https://metadata.datadrivendiscovery.org/types/Attribute',)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/src/d3m/d3m/__main__.py", line 6, in <module>
    cli.main(sys.argv)
  File "/src/d3m/d3m/cli.py", line 1111, in main
    handler(arguments, parser)
  File "/src/d3m/d3m/cli.py", line 1026, in handler
    problem_resolver=problem_resolver,
  File "/src/d3m/d3m/cli.py", line 516, in runtime_handler
    problem_resolver=problem_resolver,
  File "/src/d3m/d3m/runtime.py", line 2452, in fit_score_handler
    fit_result.check_success()
  File "/src/d3m/d3m/runtime.py", line 68, in check_success
    raise self.error
  File "/src/d3m/d3m/runtime.py", line 972, in _run
    self._do_run()
  File "/src/d3m/d3m/runtime.py", line 958, in _do_run
    self._do_run_step(step)
  File "/src/d3m/d3m/runtime.py", line 950, in _do_run_step
    ) from error
d3m.exceptions.StepFailedError: Step 2 for pipeline d11d005e-9a36-40c2-a879-722553f07f72 failed.

We need to review how semantic types are expressed in D3M datasets and figure out whether the new dataset format uses them.

rszeto commented 4 years ago

D3M Slack (mitar/bschoenfeld) recommend adding a simple profiler from common primitives, presumably this one.

rszeto commented 4 years ago

D3M Slack (mitar/bschoenfeld) recommend adding a simple profiler from common primitives, presumably this one.

Squeezing this between steps 0 and 1 in GRASTAAutoMPGPipeline passed the genjson.py step, but not the pipeline running step.

root@304146f5291f:/spider# eval `cat run_pipeline_cmds.txt | grep GRASTAAutoMPGPipeline`
WARNING:d3m.metadata.pipeline_run:'worker_id' was generated using the MAC address inside Docker container and is not a reliable compute resource identifier.
WARNING:d3m.metadata.pipeline_run:Configuration environment variable not set: D3MCPU
WARNING:d3m.metadata.pipeline_run:Configuration environment variable not set: D3MRAM
WARNING:d3m.metadata.pipeline_run:Docker image environment variable not set: D3M_BASE_IMAGE_NAME
WARNING:d3m.metadata.pipeline_run:Docker image environment variable not set: D3M_BASE_IMAGE_DIGEST
WARNING:d3m.metadata.pipeline_run:Docker image environment variable not set: D3M_IMAGE_NAME
WARNING:d3m.metadata.pipeline_run:Docker image environment variable not set: D3M_IMAGE_DIGEST
ERROR:redirect:/usr/local/lib/python3.6/dist-packages/sklearn/svm/base.py:929: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
ERROR:redirect:  "the number of iterations.", ConvergenceWarning)
Traceback (most recent call last):
  File "/src/d3m/d3m/runtime.py", line 941, in _do_run_step
    self._run_step(step)
  File "/src/d3m/d3m/runtime.py", line 931, in _run_step
    self._run_primitive(step)
  File "/src/d3m/d3m/runtime.py", line 839, in _run_primitive
    multi_call_result = self._call_primitive_method(primitive.fit_multi_produce, fit_multi_produce_arguments)
  File "/src/d3m/d3m/runtime.py", line 914, in _call_primitive_method
    raise error
  File "/src/d3m/d3m/runtime.py", line 910, in _call_primitive_method
    result = method(**arguments)
  File "/src/d3m/d3m/contrib/primitives/compute_scores.py", line 180, in fit_multi_produce
    inputs=inputs, score_dataset=score_dataset,
  File "/src/d3m/d3m/primitive_interfaces/base.py", line 565, in _fit_multi_produce
    produce_results = self._multi_produce(produce_methods=produce_methods, timeout=timeout, iterations=iterations, **kwargs)
  File "/src/d3m/d3m/primitive_interfaces/base.py", line 461, in _multi_produce
    results.append(getattr(self, method_name)(timeout=timeout, iterations=iterations, **arguments))
  File "/src/d3m/d3m/contrib/primitives/compute_scores.py", line 111, in produce
    predictions = self._get_predictions(inputs)
  File "/src/d3m/d3m/contrib/primitives/compute_scores.py", line 250, in _get_predictions
    raise exceptions.InvalidArgumentValueError("No primary index column.")
d3m.exceptions.InvalidArgumentValueError: No primary index column.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/src/d3m/d3m/__main__.py", line 6, in <module>
    cli.main(sys.argv)
  File "/src/d3m/d3m/cli.py", line 1111, in main
    handler(arguments, parser)
  File "/src/d3m/d3m/cli.py", line 1026, in handler
    problem_resolver=problem_resolver,
  File "/src/d3m/d3m/cli.py", line 516, in runtime_handler
    problem_resolver=problem_resolver,
  File "/src/d3m/d3m/runtime.py", line 2503, in fit_score_handler
    score_result.check_success()
  File "/src/d3m/d3m/runtime.py", line 68, in check_success
    raise self.error
  File "/src/d3m/d3m/runtime.py", line 972, in _run
    self._do_run()
  File "/src/d3m/d3m/runtime.py", line 958, in _do_run
    self._do_run_step(step)
  File "/src/d3m/d3m/runtime.py", line 950, in _do_run_step
    ) from error
d3m.exceptions.StepFailedError: Step 0 for pipeline f596cd77-25f8-4d4c-a350-bb30ab1e58f6 failed.
rszeto commented 4 years ago

Squeezing this between steps 0 and 1 in GRASTAAutoMPGPipeline passed the genjson.py step, but not the pipeline running step.

Actually, I just did it wrong before. This is exactly what needs to be done (e.g. see 9240c7f7d2acb79f720d3cbeb18c65aac1c240be).