IBM / data-prep-kit

Open source project for data preparation of LLM application builders
https://ibm.github.io/data-prep-kit/
Apache License 2.0
307 stars 134 forks source link

[Bug] Python launcher error when the child process dies #770

Open sujee opened 2 weeks ago

sujee commented 2 weeks ago

Search before asking

Component

Library/core

What happened + What you expected to happen

Code : https://github.com/IBM/data-prep-kit/blob/dev/examples/notebooks/intro/dpk_intro_1_python.ipynb

We launch the python launcher like this

launcher = PythonTransformLauncher(Pdf2ParquetPythonTransformConfiguration())
return_code = launcher.launch()

When the process dies, the reporting code errors out as follows:

Traceback (most recent call last):
  File "/home/sujee/apps/anaconda3/envs/dpk-5-basic-022dev2-py312/lib/python3.12/site-packages/data_processing/runtime/pure_python/transform_orchestrator.py", line 131, in orchestrate
    stats["processing_time"] = round(stats["processing_time"], 3)
                                     ~~~~~^^^^^^^^^^^^^^^^^^^
KeyError: 'processing_time'
21:47:10 ERROR - Exception during execution 'processing_time': None

When we check for runtime stats like processing_time we need to make sure these attributes actually exists.

This has to be done for both python and ray launchers.

Reproduction script

Code Step 3.2: https://github.com/IBM/data-prep-kit/blob/dev/examples/notebooks/intro/dpk_intro_1_python.ipynb

Anything else

No response

OS

Ubuntu

Python

3.11.x

Are you willing to submit a PR?

sujee commented 2 weeks ago

may be related to #719

daw3rd commented 2 weeks ago

I believe this has been address in PR #721 and will be available in a future release.