create a notebook1 that produces several output files, e.g. data/file1.csv, data/file2.csv, ...
create a notebook2 (it doesn't matter what it does because it won't run)
create a pipeline notebook1 -> notebook2, configuring the output files for notebook1 as data/*
run the pipeline on kubeflow pipelines
Result:
processing of notebook1 succeeds and the output files are properly uploaded to the COS bucket
processing of notebook2 fails:
[I 23:08:27.715] 'test_load_viz-1023160715':'Data_Viz' - downloaded Data_Viz-2289c970-b214-418b-bf93-68d880326eb0.tar.gz from bucket: pipeline-artifacts, object: test_load_viz-1023160715/Data_Viz-2289c970-b214-418b-bf93-68d880326eb0.tar.gz (0.042 secs)
Traceback (most recent call last):
File "bootstrapper.py", line 402, in <module>
main()
File "bootstrapper.py", line 393, in main
file_op.process_dependencies()
File "bootstrapper.py", line 97, in process_dependencies
self.get_file_from_object_storage(file.strip())
File "bootstrapper.py", line 137, in get_file_from_object_storage
self.cos_client.fget_object(bucket_name=self.cos_bucket,
File "/usr/local/lib/python3.8/site-packages/minio/api.py", line 719, in fget_object
stat = self.stat_object(bucket_name, object_name, sse)
File "/usr/local/lib/python3.8/site-packages/minio/api.py", line 1138, in stat_object
response = self._url_open('HEAD', bucket_name=bucket_name,
File "/usr/local/lib/python3.8/site-packages/minio/api.py", line 2017, in _url_open
raise ResponseError(response,
minio.error.NoSuchKey: NoSuchKey: message: The specified key does not exist.
The corresponding lines of code are
if inputs:
input_list = inputs.split(INOUT_SEPARATOR)
for file in input_list:
self.get_file_from_object_storage(file.strip()). <------ FAIL
To figure out which input file couldn't be processed I had to export the pipeline and inspect the generated bootstrapper script:
version 1.3
data/file1.csv, data/file2.csv, ...
data/*
Result:
The corresponding lines of code are
To figure out which input file couldn't be processed I had to export the pipeline and inspect the generated bootstrapper script:
Input is set to
data/bank-additional/*
, which seems to cause the failure.Issues: