facebookresearch / dlrm

An implementation of a deep learning recommendation model (DLRM)
MIT License
3.71k stars 825 forks source link

Docker Build failing #373

Open JerryQGui opened 8 months ago

JerryQGui commented 8 months ago

Running docker build on this repo fails

[3/6] RUN pip install -r requirements.txt: 7 2.610 Collecting future (from -r requirements.txt (line 1)) 7 3.026 Downloading https://files.pythonhosted.org/packages/8f/2e/cf6accf7415237d6faeeebdc7832023c90e0282aa16fd3263db0eb4715ec/future-0.18.3.tar.gz (840kB) 7 5.126 Requirement already satisfied: numpy in /opt/conda/lib/python3.6/site-packages (from -r requirements.txt (line 2)) (1.17.2) 7 5.161 Collecting onnx (from -r requirements.txt (line 3)) 7 5.621 Downloading https://files.pythonhosted.org/packages/8f/71/1543d8dad6a26df1da8953653ebdbedacea9f1a5bcd023fe10f8c5f66d63/onnx-1.14.1.tar.gz (11.3MB) 7 16.23 Installing build dependencies: started 7 23.24 Installing build dependencies: finished with status 'done' 7 23.24 Getting requirements to build wheel: started 7 23.96 Getting requirements to build wheel: finished with status 'error' 7 23.96 ERROR: Command errored out with exit status 1: 7 23.96 command: /opt/conda/bin/python /opt/conda/lib/python3.6/site-packages/pip/_vendor/pep517/_in_process.py get_requires_for_build_wheel /tmp/tmp7jnw462n 7 23.96 cwd: /tmp/pip-install-ylpvvqkf/onnx 7 23.96 Complete output (19 lines): 7 23.96 fatal: Not a git repository (or any of the parent directories): .git 7 23.96 Traceback (most recent call last): 7 23.96 File "/opt/conda/lib/python3.6/site-packages/pip/_vendor/pep517/_in_process.py", line 207, in 7 23.96 main() 7 23.96 File "/opt/conda/lib/python3.6/site-packages/pip/_vendor/pep517/_in_process.py", line 197, in main 7 23.96 json_out['return_val'] = hook(**hook_input['kwargs']) 7 23.96 File "/opt/conda/lib/python3.6/site-packages/pip/_vendor/pep517/_in_process.py", line 54, in get_requires_for_build_wheel 7 23.96 return hook(config_settings) 7 23.96 File "/tmp/pip-build-env-racybm4e/overlay/lib/python3.6/site-packages/setuptools/build_meta.py", line 163, in get_requires_for_build_wheel 7 23.96 config_settings, requirements=['wheel']) 7 23.96 File "/tmp/pip-build-env-racybm4e/overlay/lib/python3.6/site-packages/setuptools/build_meta.py", line 143, in _get_build_requires 7 23.96 self.run_setup() 7 23.96 File "/tmp/pip-build-env-racybm4e/overlay/lib/python3.6/site-packages/setuptools/build_meta.py", line 268, in run_setup 7 23.96 self).run_setup(setup_script=setup_script) 7 23.96 File "/tmp/pip-build-env-racybm4e/overlay/lib/python3.6/site-packages/setuptools/build_meta.py", line 158, in run_setup 7 23.96 exec(compile(code, file, 'exec'), locals()) 7 23.96 File "setup.py", line 85, in 7 23.96 assert CMAKE, "Could not find cmake executable!" 7 23.96 AssertionError: Could not find cmake executable! 7 23.96 ---------------------------------------- 7 24.74 ERROR: Command errored out with exit status 1: /opt/conda/bin/python /opt/conda/lib/python3.6/site-packages/pip/_vendor/pep517/_in_process.py get_requires_for_build_wheel /tmp/tmp7jnw462n Check the logs for full command output.

executor failed running [/bin/sh -c pip install -r requirements.txt]: exit code: 1

Raynchowkw commented 7 months ago

I also met the cmake issue when installing onnx. Here is my workaround. I am using conda and not using docker. Dockerfile is older than requirements.txt, but even requirements.txt is not catching up yet.

  1. delete onnx in requirements.txt.
  2. I ran these commands in my conda env:
    conda install pip
    conda install python=3.10
    pip install -r requirements.txt
    pip install tensorboard
    git clone https://github.com/mlperf/logging.git mlperf-logging
    pip install -e mlperf-logging
    pip install onnx
  3. python dlrm_s_pytorch.py successfully run.

I did not bother to specify the version of torch==1.3.1 as shown in Dockerfile. Hope this helps.