bytedance / flux

A fast communication-overlapping library for tensor parallelism on GPUs.
Apache License 2.0
223 stars 17 forks source link

[BUG] Failing to install byte-flux from pypi #30

Open tlrmchlsmth opened 3 months ago

tlrmchlsmth commented 3 months ago

Describe the bug I'm unable to install byte-flux from pypi.

To Reproduce

In a fresh venv, run:

pip install torch packaging wheel numpy

and then

pip install byte-flux

Sidenote: It should be possible to install flux without manually installing the requirements.

Stack trace/logs

~/tmp » pip install byte-flux
Collecting byte-flux
  Using cached byte_flux-1.0.2.tar.gz (216 kB)
  Preparing metadata (setup.py) ... done
Requirement already satisfied: torch in ./flux_env/lib/python3.10/site-packages (from byte-flux) (2.4.0)
Requirement already satisfied: nvidia-cudnn-cu12==9.1.0.70 in ./flux_env/lib/python3.10/site-packages (from torch->byte-flux) (9.1.0.70)
Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.1.105 in ./flux_env/lib/python3.10/site-packages (from torch->byte-flux) (12.1.105)
Requirement already satisfied: nvidia-cuda-cupti-cu12==12.1.105 in ./flux_env/lib/python3.10/site-packages (from torch->byte-flux) (12.1.105)
Requirement already satisfied: triton==3.0.0 in ./flux_env/lib/python3.10/site-packages (from torch->byte-flux) (3.0.0)
Requirement already satisfied: nvidia-cusparse-cu12==12.1.0.106 in ./flux_env/lib/python3.10/site-packages (from torch->byte-flux) (12.1.0.106)
Requirement already satisfied: nvidia-cusolver-cu12==11.4.5.107 in ./flux_env/lib/python3.10/site-packages (from torch->byte-flux) (11.4.5.107)
Requirement already satisfied: filelock in ./flux_env/lib/python3.10/site-packages (from torch->byte-flux) (3.15.4)
Requirement already satisfied: nvidia-cufft-cu12==11.0.2.54 in ./flux_env/lib/python3.10/site-packages (from torch->byte-flux) (11.0.2.54)
Requirement already satisfied: nvidia-nccl-cu12==2.20.5 in ./flux_env/lib/python3.10/site-packages (from torch->byte-flux) (2.20.5)
Requirement already satisfied: nvidia-nvtx-cu12==12.1.105 in ./flux_env/lib/python3.10/site-packages (from torch->byte-flux) (12.1.105)
Requirement already satisfied: networkx in ./flux_env/lib/python3.10/site-packages (from torch->byte-flux) (3.3)
Requirement already satisfied: nvidia-curand-cu12==10.3.2.106 in ./flux_env/lib/python3.10/site-packages (from torch->byte-flux) (10.3.2.106)
Requirement already satisfied: typing-extensions>=4.8.0 in ./flux_env/lib/python3.10/site-packages (from torch->byte-flux) (4.12.2)
Requirement already satisfied: nvidia-cublas-cu12==12.1.3.1 in ./flux_env/lib/python3.10/site-packages (from torch->byte-flux) (12.1.3.1)
Requirement already satisfied: sympy in ./flux_env/lib/python3.10/site-packages (from torch->byte-flux) (1.13.2)
Requirement already satisfied: fsspec in ./flux_env/lib/python3.10/site-packages (from torch->byte-flux) (2024.6.1)
Requirement already satisfied: nvidia-cuda-runtime-cu12==12.1.105 in ./flux_env/lib/python3.10/site-packages (from torch->byte-flux) (12.1.105)
Requirement already satisfied: jinja2 in ./flux_env/lib/python3.10/site-packages (from torch->byte-flux) (3.1.4)
Requirement already satisfied: nvidia-nvjitlink-cu12 in ./flux_env/lib/python3.10/site-packages (from nvidia-cusolver-cu12==11.4.5.107->torch->byte-flux) (12.6.20)
Requirement already satisfied: MarkupSafe>=2.0 in ./flux_env/lib/python3.10/site-packages (from jinja2->torch->byte-flux) (2.1.5)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in ./flux_env/lib/python3.10/site-packages (from sympy->torch->byte-flux) (1.3.0)
Building wheels for collected packages: byte-flux
  Building wheel for byte-flux (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> [90 lines of output]
      /home/tms/tmp/flux_env/lib/python3.10/site-packages/setuptools/installer.py:27: SetuptoolsDeprecationWarning: setuptools.installer is deprecated. Requirements should be satisfied by a PEP 517 installer.
        warnings.warn(
      running bdist_wheel
      Precompiled wheel not found. Building from source...
      running build
      running build_py
      creating build
      creating build/lib.linux-x86_64-3.10
      creating build/lib.linux-x86_64-3.10/flux
      copying python/flux/__init__.py -> build/lib.linux-x86_64-3.10/flux
      copying python/flux/ag_kernel_crossnode.py -> build/lib.linux-x86_64-3.10/flux
      copying python/flux/cpp_mod.py -> build/lib.linux-x86_64-3.10/flux
      copying python/flux/util.py -> build/lib.linux-x86_64-3.10/flux
      copying python/flux/ag_gemm.py -> build/lib.linux-x86_64-3.10/flux
      copying python/flux/dist_utils.py -> build/lib.linux-x86_64-3.10/flux
      copying python/flux/gemm_rs_sm80.py -> build/lib.linux-x86_64-3.10/flux
      running build_ext
      building 'flux_ths_pybind' extension
      Emitting ninja build file /tmp/pip-install-aocei8gj/byte-flux_3822a6dd00d6414daba3ae0de3930a5a/build/temp.linux-x86_64-3.10/build.ninja...
      Traceback (most recent call last):
        File "/tmp/pip-install-aocei8gj/byte-flux_3822a6dd00d6414daba3ae0de3930a5a/setup.py", line 217, in run
          urllib.request.urlretrieve(wheel_url, wheel_filename)
        File "/usr/lib/python3.10/urllib/request.py", line 241, in urlretrieve
          with contextlib.closing(urlopen(url, data)) as fp:
        File "/usr/lib/python3.10/urllib/request.py", line 216, in urlopen
          return opener.open(url, data, timeout)
        File "/usr/lib/python3.10/urllib/request.py", line 525, in open
          response = meth(req, response)
        File "/usr/lib/python3.10/urllib/request.py", line 634, in http_response
          response = self.parent.error(
        File "/usr/lib/python3.10/urllib/request.py", line 563, in error
          return self._call_chain(*args)
        File "/usr/lib/python3.10/urllib/request.py", line 496, in _call_chain
          result = func(*args)
        File "/usr/lib/python3.10/urllib/request.py", line 643, in http_error_default
          raise HTTPError(req.full_url, code, msg, hdrs, fp)
      urllib.error.HTTPError: HTTP Error 404: Not Found

      During handling of the above exception, another exception occurred:

      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-install-aocei8gj/byte-flux_3822a6dd00d6414daba3ae0de3930a5a/setup.py", line 276, in <module>
          main()
        File "/tmp/pip-install-aocei8gj/byte-flux_3822a6dd00d6414daba3ae0de3930a5a/setup.py", line 251, in main
          setuptools.setup(
        File "/home/tms/tmp/flux_env/lib/python3.10/site-packages/setuptools/__init__.py", line 153, in setup
          return distutils.core.setup(**attrs)
        File "/usr/lib/python3.10/distutils/core.py", line 148, in setup
          dist.run_commands()
        File "/usr/lib/python3.10/distutils/dist.py", line 966, in run_commands
          self.run_command(cmd)
        File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command
          cmd_obj.run()
        File "/tmp/pip-install-aocei8gj/byte-flux_3822a6dd00d6414daba3ae0de3930a5a/setup.py", line 234, in run
          super().run()
        File "/home/tms/tmp/flux_env/lib/python3.10/site-packages/wheel/_bdist_wheel.py", line 378, in run
          self.run_command("build")
        File "/usr/lib/python3.10/distutils/cmd.py", line 313, in run_command
          self.distribution.run_command(command)
        File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command
          cmd_obj.run()
        File "/usr/lib/python3.10/distutils/command/build.py", line 135, in run
          self.run_command(cmd_name)
        File "/usr/lib/python3.10/distutils/cmd.py", line 313, in run_command
          self.distribution.run_command(command)
        File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command
          cmd_obj.run()
        File "/home/tms/tmp/flux_env/lib/python3.10/site-packages/setuptools/command/build_ext.py", line 79, in run
          _build_ext.run(self)
        File "/usr/lib/python3.10/distutils/command/build_ext.py", line 340, in run
          self.build_extensions()
        File "/home/tms/tmp/flux_env/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 866, in build_extensions
          build_ext.build_extensions(self)
        File "/usr/lib/python3.10/distutils/command/build_ext.py", line 449, in build_extensions
          self._build_extensions_serial()
        File "/usr/lib/python3.10/distutils/command/build_ext.py", line 474, in _build_extensions_serial
          self.build_extension(ext)
        File "/home/tms/tmp/flux_env/lib/python3.10/site-packages/setuptools/command/build_ext.py", line 202, in build_extension
          _build_ext.build_extension(self, ext)
        File "/usr/lib/python3.10/distutils/command/build_ext.py", line 529, in build_extension
          objects = self.compiler.compile(sources,
        File "/home/tms/tmp/flux_env/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 679, in unix_wrap_ninja_compile
          _write_ninja_file_and_compile_objects(
        File "/home/tms/tmp/flux_env/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1771, in _write_ninja_file_and_compile_objects
          _write_ninja_file(
        File "/home/tms/tmp/flux_env/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2300, in _write_ninja_file
          assert len(sources) > 0
      AssertionError
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for byte-flux
  Running setup.py clean for byte-flux
Failed to build byte-flux
wenlei-bao commented 3 months ago

cc @zheng-ningxin

Rainlin007 commented 3 months ago

same to me

zheng-ningxin commented 2 months ago

Hello @tlrmchlsmth @Rainlin007 , sorry for the late reply. Could you please let me know your torch.version and torch.version.cuda versions?

tlrmchlsmth commented 2 months ago

torch is 2.4 and torch.version.cuda is 12.1

If it helps, here is my pip freeze output after running pip install torch packaging wheel numpy and before running pip install byte-flux

filelock==3.15.4
fsspec==2024.6.1
Jinja2==3.1.4
MarkupSafe==2.1.5
mpmath==1.3.0
networkx==3.3
numpy==2.1.0
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==9.1.0.70
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.20.5
nvidia-nvjitlink-cu12==12.6.20
nvidia-nvtx-cu12==12.1.105
packaging==24.1
sympy==1.13.2
torch==2.4.0
triton==3.0.0
typing_extensions==4.12.2

This is a blocker on https://github.com/vllm-project/vllm/pull/5917.

houqi commented 2 months ago

seems no torch 2.4.0 wheel package compiled. @zheng-ningxin

wenlei-bao commented 2 months ago

@zheng-ningxin can we prioritize this issue since it is blocking @tlrmchlsmth 's PR to merge? I think we just missed the torch 2.4 right ?

wenlei-bao commented 2 months ago

torch is 2.4 and torch.version.cuda is 12.1

If it helps, here is my pip freeze output after running pip install torch packaging wheel numpy and before running pip install byte-flux

filelock==3.15.4
fsspec==2024.6.1
Jinja2==3.1.4
MarkupSafe==2.1.5
mpmath==1.3.0
networkx==3.3
numpy==2.1.0
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==9.1.0.70
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.20.5
nvidia-nvjitlink-cu12==12.6.20
nvidia-nvtx-cu12==12.1.105
packaging==24.1
sympy==1.13.2
torch==2.4.0
triton==3.0.0
typing_extensions==4.12.2

This is a blocker on vllm-project/vllm#5917.

Sorry for the late reply, @tlrmchlsmth . We will take a look and fix it to unblock you. Thanks.

wenlei-bao commented 2 months ago

@tlrmchlsmth Please check the new release here https://github.com/bytedance/flux/releases/tag/v1.0.3 made by @zheng-ningxin that include torch 2.4 and cuda 12.1.