IBM / data-prep-kit

Open source project for data preparation of LLM application builders
https://ibm.github.io/data-prep-kit/
Apache License 2.0
47 stars 22 forks source link

[Bug] Transforms images latest tags are not up to date with repo source #176

Open revit13 opened 1 month ago

revit13 commented 1 month ago

Search before asking

Component

Transforms/Other

What happened + What you expected to happen

Running the malware workflow on KFP GUI produces the following error. I except to not get an error.

time="2024-05-22T23:34:54.659Z" level=info msg="capturing logs" argo=true
23:35:07 INFO - submitted job successfully, submission id raysubmit_8WqH2dErjaTsWC8h
23:35:07 INFO - data factory data_ is using S3 data access: input path - test/malware/input, output path - test/malware/output
23:35:07 INFO - data factory data_ max_files -1, n_sample -1
23:35:07 INFO - data factory data_ Not using data sets, checkpointing False, max files -1, random samples -1, files to use ['.parquet']
23:35:09 INFO - job status is FAILED
23:35:08 INFO - Launching malware transform
usage: malware_transform.py [-h] [--malware_input_column MALWARE_INPUT_COLUMN]
                            [--malware_output_column MALWARE_OUTPUT_COLUMN]
                            [--data_s3_cred DATA_S3_CRED]
                            [--data_s3_config DATA_S3_CONFIG]
                            [--data_local_config DATA_LOCAL_CONFIG]
                            [--data_max_files DATA_MAX_FILES]
                            [--data_checkpointing DATA_CHECKPOINTING]
                            [--data_data_sets DATA_DATA_SETS]
                            [--data_files_to_use DATA_FILES_TO_USE]
                            [--data_num_samples DATA_NUM_SAMPLES]
                            [--runtime_pipeline_id RUNTIME_PIPELINE_ID]
                            [--runtime_job_id RUNTIME_JOB_ID]
                            [--runtime_code_location RUNTIME_CODE_LOCATION]
malware_transform.py: error: unrecognized arguments: --runtime_num_workers=4 --runtime_worker_options={'num_cpus': 0.8}
23:35:11 INFO - Job completed with execution status FAILED
Error: exit status 1

Reproduction script

First generate the workflow yaml:

cd kfp/transform_workflows/code/malware/
make PYTHON=python3.10 build

Next, upload and run the pipeline using the KFP GUI.

Anything else

If the malware image is built from the sources and loaded to kind cluster using load-image the workflow passes OK.

OS

Ubuntu

Python

3.10.x

Are you willing to submit a PR?

daw3rd commented 2 weeks ago

If the versions/tags is really the problem as the title currently suggests, this may be fixed with PR #309 since it sets all the versions to be the same.

daw3rd commented 3 days ago

Also, unrecognized arguments message suggests the image is the python image and not the ray image.