NVIDIA / NeMo-Curator

Scalable toolkit for data curation
Apache License 2.0
327 stars 32 forks source link

Import fails on cpu #109

Closed yyu22 closed 14 hours ago

yyu22 commented 2 weeks ago

Describe the bug The GPU version of curator fails during import when running on cpu only nodes.

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/cupy/__init__.py", line 17, in <module>
    from cupy import _core  # NOQA
  File "/usr/local/lib/python3.10/dist-packages/cupy/_core/__init__.py", line 3, in <module>
    from cupy._core import core  # NOQA
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/NeMo-Curator/nemo_curator/__init__.py", line 29, in <module>
    from .modules import *
  File "/opt/NeMo-Curator/nemo_curator/modules/__init__.py", line 24, in <module>
    from .add_id import AddId
  File "/opt/NeMo-Curator/nemo_curator/modules/add_id.py", line 21, in <module>
    from nemo_curator.datasets import DocumentDataset
  File "/opt/NeMo-Curator/nemo_curator/datasets/__init__.py", line 15, in <module>
    from .doc_dataset import DocumentDataset
  File "/opt/NeMo-Curator/nemo_curator/datasets/doc_dataset.py", line 19, in <module>
    from nemo_curator.utils.distributed_utils import read_data, write_to_disk
  File "/opt/NeMo-Curator/nemo_curator/utils/distributed_utils.py", line 32, in <module>
    cudf = gpu_only_import("cudf")
  File "/opt/NeMo-Curator/nemo_curator/utils/import_utils.py", line 347, in gpu_only_import
    return safe_import(
  File "/opt/NeMo-Curator/nemo_curator/utils/import_utils.py", line 261, in safe_import
    return importlib.import_module(module)
  File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/usr/local/lib/python3.10/dist-packages/cudf/__init__.py", line 12, in <module>
    import cupy
  File "/usr/local/lib/python3.10/dist-packages/cupy/__init__.py", line 19, in <module>
    raise ImportError(f'''
ImportError: 
================================================================
Failed to import CuPy.

If you installed CuPy via wheels (cupy-cudaXXX or cupy-rocm-X-X), make sure that the package matches with the version of CUDA or ROCm installed.

On Linux, you may need to set LD_LIBRARY_PATH environment variable depending on how you installed CUDA/ROCm.
On Windows, try setting CUDA_PATH environment variable.

Check the Installation Guide for details:
  https://docs.cupy.dev/en/latest/install.html

Original error:
  ImportError: libcuda.so.1: cannot open shared object file: No such file or directory
================================================================

Steps/Code to reproduce bug

  1. Install GPU version of curator or use nemo framework container

  2. Run import nemo_curator on cpu-only node/machine

Expected behavior

The GPU version should still work on cpu-only node for steps that does not require GPU (e.g., add id).