Closed rszeto closed 4 years ago
The updated datasets do not seem to support semantic types. Check this by running the following in the private-datasets
branch:
root@f1e4702c7a4a:/spider# eval `cat run_pipeline_cmds.txt | grep GRASTAAutoMPGPipeline`
WARNING:d3m.metadata.pipeline_run:'worker_id' was generated using the MAC address inside Docker container and is not a reliable compute resource identifier.
WARNING:d3m.metadata.pipeline_run:Configuration environment variable not set: D3MCPU
WARNING:d3m.metadata.pipeline_run:Configuration environment variable not set: D3MRAM
WARNING:d3m.metadata.pipeline_run:Docker image environment variable not set: D3M_BASE_IMAGE_NAME
WARNING:d3m.metadata.pipeline_run:Docker image environment variable not set: D3M_BASE_IMAGE_DIGEST
WARNING:d3m.metadata.pipeline_run:Docker image environment variable not set: D3M_IMAGE_NAME
WARNING:d3m.metadata.pipeline_run:Docker image environment variable not set: D3M_IMAGE_DIGEST
WARNING:d3m.metadata.pipeline:Digest for primitive 'fe20ef05-7eaf-428b-934f-4de0b8011ed2' does not match the one specified in the primitive description. Primitive description digest: 4212cc0a34c7083e8fa3b811674fc40fcff724aafd2a2e21b99a8a53de0d9fc7. Resolved primitive digest: 9ec1103dfcc9b16ff61b02ca8bc3f812742684a060f4c85cb148ff5a654f8d1a.
Traceback (most recent call last):
File "/src/d3m/d3m/runtime.py", line 941, in _do_run_step
self._run_step(step)
File "/src/d3m/d3m/runtime.py", line 931, in _run_step
self._run_primitive(step)
File "/src/d3m/d3m/runtime.py", line 839, in _run_primitive
multi_call_result = self._call_primitive_method(primitive.fit_multi_produce, fit_multi_produce_arguments)
File "/src/d3m/d3m/runtime.py", line 914, in _call_primitive_method
raise error
File "/src/d3m/d3m/runtime.py", line 910, in _call_primitive_method
result = method(**arguments)
File "/src/d3m/d3m/primitive_interfaces/transformer.py", line 72, in fit_multi_produce
return self._fit_multi_produce(produce_methods=produce_methods, timeout=timeout, iterations=iterations, inputs=inputs)
File "/src/d3m/d3m/primitive_interfaces/base.py", line 565, in _fit_multi_produce
produce_results = self._multi_produce(produce_methods=produce_methods, timeout=timeout, iterations=iterations, **kwargs)
File "/src/d3m/d3m/primitive_interfaces/base.py", line 461, in _multi_produce
results.append(getattr(self, method_name)(timeout=timeout, iterations=iterations, **arguments))
File "/spider/src/common-primitives/common_primitives/extract_columns_semantic_types.py", line 98, in produce
columns_to_use = self._get_columns(inputs.metadata)
File "/spider/src/common-primitives/common_primitives/extract_columns_semantic_types.py", line 131, in _get_columns
semantic_types=self.hyperparams['semantic_types'],
ValueError: Input data has no columns matching semantic types: ('https://metadata.datadrivendiscovery.org/types/Attribute',)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/src/d3m/d3m/__main__.py", line 6, in <module>
cli.main(sys.argv)
File "/src/d3m/d3m/cli.py", line 1111, in main
handler(arguments, parser)
File "/src/d3m/d3m/cli.py", line 1026, in handler
problem_resolver=problem_resolver,
File "/src/d3m/d3m/cli.py", line 516, in runtime_handler
problem_resolver=problem_resolver,
File "/src/d3m/d3m/runtime.py", line 2452, in fit_score_handler
fit_result.check_success()
File "/src/d3m/d3m/runtime.py", line 68, in check_success
raise self.error
File "/src/d3m/d3m/runtime.py", line 972, in _run
self._do_run()
File "/src/d3m/d3m/runtime.py", line 958, in _do_run
self._do_run_step(step)
File "/src/d3m/d3m/runtime.py", line 950, in _do_run_step
) from error
d3m.exceptions.StepFailedError: Step 2 for pipeline d11d005e-9a36-40c2-a879-722553f07f72 failed.
We need to review how semantic types are expressed in D3M datasets and figure out whether the new dataset format uses them.
D3M Slack (mitar/bschoenfeld) recommend adding a simple profiler from common primitives, presumably this one.
D3M Slack (mitar/bschoenfeld) recommend adding a simple profiler from common primitives, presumably this one.
Squeezing this between steps 0 and 1 in GRASTAAutoMPGPipeline passed the genjson.py
step, but not the pipeline running step.
root@304146f5291f:/spider# eval `cat run_pipeline_cmds.txt | grep GRASTAAutoMPGPipeline`
WARNING:d3m.metadata.pipeline_run:'worker_id' was generated using the MAC address inside Docker container and is not a reliable compute resource identifier.
WARNING:d3m.metadata.pipeline_run:Configuration environment variable not set: D3MCPU
WARNING:d3m.metadata.pipeline_run:Configuration environment variable not set: D3MRAM
WARNING:d3m.metadata.pipeline_run:Docker image environment variable not set: D3M_BASE_IMAGE_NAME
WARNING:d3m.metadata.pipeline_run:Docker image environment variable not set: D3M_BASE_IMAGE_DIGEST
WARNING:d3m.metadata.pipeline_run:Docker image environment variable not set: D3M_IMAGE_NAME
WARNING:d3m.metadata.pipeline_run:Docker image environment variable not set: D3M_IMAGE_DIGEST
ERROR:redirect:/usr/local/lib/python3.6/dist-packages/sklearn/svm/base.py:929: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
ERROR:redirect: "the number of iterations.", ConvergenceWarning)
Traceback (most recent call last):
File "/src/d3m/d3m/runtime.py", line 941, in _do_run_step
self._run_step(step)
File "/src/d3m/d3m/runtime.py", line 931, in _run_step
self._run_primitive(step)
File "/src/d3m/d3m/runtime.py", line 839, in _run_primitive
multi_call_result = self._call_primitive_method(primitive.fit_multi_produce, fit_multi_produce_arguments)
File "/src/d3m/d3m/runtime.py", line 914, in _call_primitive_method
raise error
File "/src/d3m/d3m/runtime.py", line 910, in _call_primitive_method
result = method(**arguments)
File "/src/d3m/d3m/contrib/primitives/compute_scores.py", line 180, in fit_multi_produce
inputs=inputs, score_dataset=score_dataset,
File "/src/d3m/d3m/primitive_interfaces/base.py", line 565, in _fit_multi_produce
produce_results = self._multi_produce(produce_methods=produce_methods, timeout=timeout, iterations=iterations, **kwargs)
File "/src/d3m/d3m/primitive_interfaces/base.py", line 461, in _multi_produce
results.append(getattr(self, method_name)(timeout=timeout, iterations=iterations, **arguments))
File "/src/d3m/d3m/contrib/primitives/compute_scores.py", line 111, in produce
predictions = self._get_predictions(inputs)
File "/src/d3m/d3m/contrib/primitives/compute_scores.py", line 250, in _get_predictions
raise exceptions.InvalidArgumentValueError("No primary index column.")
d3m.exceptions.InvalidArgumentValueError: No primary index column.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/src/d3m/d3m/__main__.py", line 6, in <module>
cli.main(sys.argv)
File "/src/d3m/d3m/cli.py", line 1111, in main
handler(arguments, parser)
File "/src/d3m/d3m/cli.py", line 1026, in handler
problem_resolver=problem_resolver,
File "/src/d3m/d3m/cli.py", line 516, in runtime_handler
problem_resolver=problem_resolver,
File "/src/d3m/d3m/runtime.py", line 2503, in fit_score_handler
score_result.check_success()
File "/src/d3m/d3m/runtime.py", line 68, in check_success
raise self.error
File "/src/d3m/d3m/runtime.py", line 972, in _run
self._do_run()
File "/src/d3m/d3m/runtime.py", line 958, in _do_run
self._do_run_step(step)
File "/src/d3m/d3m/runtime.py", line 950, in _do_run_step
) from error
d3m.exceptions.StepFailedError: Step 0 for pipeline f596cd77-25f8-4d4c-a350-bb30ab1e58f6 failed.
Squeezing this between steps 0 and 1 in GRASTAAutoMPGPipeline passed the genjson.py step, but not the pipeline running step.
Actually, I just did it wrong before. This is exactly what needs to be done (e.g. see 9240c7f7d2acb79f720d3cbeb18c65aac1c240be).
Our current pipelines use the public dataset repo, whereas we should use the private one.
This involves changing the Docker mount to point to the appropriate cloned repository on the host and handling everything that breaks as a result.