Lightning-AI / litdata

Transform datasets at scale. Optimize datasets for fast AI model training.
Apache License 2.0
334 stars 39 forks source link

LitData leaves a `status.json` in current working dir #159

Closed awaelchli closed 2 months ago

awaelchli commented 3 months ago

🐛 Bug

Every time litdata executes (in a Studio), a json file is left behind.

To Reproduce

Run any litdata example in a Studio, for example the one from #158.

image

Expected behavior

No temporary files in current working dir.

Environment

Lightning Studio litdata==0.2.8

deependujha commented 3 months ago

The corresponding code for this is at src/litdata/processing/data_processor.py#998

Screenshot from 2024-06-10 15-59-43

Was this expected to be some sort of reporting mechanism?

To fix the issue, we can simply comment the specified code, but, it must have been there for some reason! But, I couldn't find any usage of it.

But, if this fix is expected, I will be happy to open a PR.

deependujha commented 3 months ago

This was handled in https://github.com/Lightning-AI/litdata/pull/161/files#diff-0f269073e71092da410b41c82ebf3a4eff2d69fe2e66ee648ba2ca53c9234d9b

_ENABLE_STATUS = bool(int(os.getenv("ENABLE_STATUS_REPORT", "0")))

Now, to get status.json file, set environment variable: export ENABLE_STATUS_REPORT=1 in your terminal.

I believe this issue can be closed now.