databricks / megablocks

Apache License 2.0
1.11k stars 154 forks source link

Error from pip about missing torch module #78

Closed michaelwhitford closed 5 months ago

michaelwhitford commented 6 months ago

I am trying to install megablocks for use with vllm. I have a venv setup for vllm, and have it installed and working fine with non Mixtral models.

I am using python 3.11.6 installed with homebrew.

When I try to install megablocks I see the following output:

pip install megablocks
...
ModuleNotFoundError: No module named 'torch'
...
 Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

When I look at the packages installed I see that torch==2.1.2 is in fact installed:

λ pip freeze | grep torch
torch==2.1.2

What am I doing wrong?

andrewssobral commented 6 months ago

Same problem for me, I am using the following Dockerfile, the last line crashes:

RUN python${PYTHON_VERSION} -m pip install megablocks==0.5.0 any idea?

# docker build -t ogrerun/vllm-cuda:12.1-py3.10 .
# docker run -it --rm --gpus all ogrerun/vllm-cuda:12.1-py3.10 bash
FROM nvcr.io/nvidia/cuda:12.1.0-cudnn8-devel-ubuntu20.04

# Install dependencies
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y build-essential checkinstall gdb lcov pkg-config wget \
    libbz2-dev libffi-dev libgdbm-dev libgdbm-compat-dev liblzma-dev \
    libncurses5-dev libncursesw5-dev libreadline6-dev libsqlite3-dev libssl-dev \
    lzma lzma-dev tk-dev uuid-dev zlib1g-dev libc6-dev libgpm2

# The latest version of Python 3.10 is 3.10.13
ENV PYTHON_MAJOR_VERSION=3
ENV PYTHON_MINOR_VERSION=10
ENV PYTHON_PATCH_VERSION=13
ENV PYTHON_VERSION=${PYTHON_MAJOR_VERSION}.${PYTHON_MINOR_VERSION}
ENV PYTHON_FULL_VERSION=${PYTHON_MAJOR_VERSION}.${PYTHON_MINOR_VERSION}.${PYTHON_PATCH_VERSION}

# Install Python
# RUN apt-get install -y python${PYTHON_VERSION} python${PYTHON_VERSION}-venv python${PYTHON_VERSION}-dev python${PYTHON_MAJOR_VERSION}-pip

# Build Python from source
RUN wget https://www.python.org/ftp/python/${PYTHON_FULL_VERSION}/Python-${PYTHON_FULL_VERSION}.tgz
RUN tar -zxvf Python-${PYTHON_FULL_VERSION}.tgz
# RUN cd Python-${PYTHON_FULL_VERSION} && ./configure --enable-optimizations && make -j8 && make install
# RUN cd Python-${PYTHON_FULL_VERSION} && ./configure && make altinstall
RUN cd Python-${PYTHON_FULL_VERSION} && ./configure && make -j8 build_all && make install

# Update pip
RUN python${PYTHON_VERSION} -m pip install --upgrade pip

# Install required python packages
RUN python${PYTHON_VERSION} -m pip install transformers==4.36.2 accelerate==0.25.0 autoawq==0.1.8 vllm==0.2.6
RUN python${PYTHON_VERSION} -m pip install megablocks==0.5.0

Here are the python packages available before doing the pip install megablocks:

root@eb163e6f48bf:/# pip freeze
absl-py==2.0.0
accelerate==0.25.0
aiohttp==3.9.1
aioprometheus==23.12.0
aiosignal==1.3.1
anyio==4.2.0
async-timeout==4.0.3
attributedict==0.3.0
attrs==23.1.0
autoawq==0.1.8
blessings==1.7
cachetools==5.3.2
certifi==2023.11.17
chardet==5.2.0
charset-normalizer==3.3.2
click==8.1.7
codecov==2.1.13
colorama==0.4.6
coloredlogs==15.0.1
colour-runner==0.1.1
coverage==7.4.0
DataProperty==1.0.1
datasets==2.16.1
deepdiff==6.7.1
dill==0.3.7
distlib==0.3.8
evaluate==0.4.1
exceptiongroup==1.2.0
fastapi==0.108.0
filelock==3.13.1
frozenlist==1.4.1
fsspec==2023.10.0
h11==0.14.0
httptools==0.6.1
huggingface-hub==0.20.1
humanfriendly==10.0
idna==3.6
inspecta==0.1.3
Jinja2==3.1.2
joblib==1.3.2
jsonlines==4.0.0
jsonschema==4.20.0
jsonschema-specifications==2023.12.1
lm_eval==0.4.0
lxml==5.0.0
MarkupSafe==2.1.3
mbstrdecoder==1.1.3
mpmath==1.3.0
msgpack==1.0.7
multidict==6.0.4
multiprocess==0.70.15
networkx==3.2.1
ninja==1.11.1.1
nltk==3.8.1
numexpr==2.8.8
numpy==1.26.2
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.18.1
nvidia-nvjitlink-cu12==12.3.101
nvidia-nvtx-cu12==12.1.105
ordered-set==4.1.0
orjson==3.9.10
packaging==23.2
pandas==2.1.4
pathvalidate==3.2.0
peft==0.7.1
Pillow==10.1.0
platformdirs==4.1.0
pluggy==1.3.0
portalocker==2.8.2
protobuf==4.25.1
psutil==5.9.7
pyarrow==14.0.2
pyarrow-hotfix==0.6
pybind11==2.11.1
pydantic==1.10.13
Pygments==2.17.2
pyproject-api==1.6.1
pytablewriter==1.2.0
python-dateutil==2.8.2
python-dotenv==1.0.0
pytz==2023.3.post1
PyYAML==6.0.1
quantile-python==1.1
ray==2.9.0
referencing==0.32.0
regex==2023.12.25
requests==2.31.0
responses==0.18.0
rootpath==0.1.1
rouge_score==0.1.2
rpds-py==0.16.2
sacrebleu==2.4.0
safetensors==0.4.1
scikit-learn==1.3.2
scipy==1.11.4
sentencepiece==0.1.99
six==1.16.0
sniffio==1.3.0
sqlitedict==2.1.0
starlette==0.32.0.post1
sympy==1.12
tabledata==1.3.3
tabulate==0.9.0
tcolorpy==0.1.4
termcolor==2.4.0
texttable==1.7.0
threadpoolctl==3.2.0
tokenizers==0.15.0
toml==0.10.2
tomli==2.0.1
torch==2.1.2
torchvision==0.16.2
tox==4.11.4
tqdm==4.66.1
tqdm-multiprocess==0.0.11
transformers==4.36.2
triton==2.1.0
typepy==1.3.2
typing_extensions==4.9.0
tzdata==2023.4
urllib3==2.1.0
uvicorn==0.25.0
uvloop==0.19.0
virtualenv==20.25.0
vllm==0.2.6
watchfiles==0.21.0
websockets==12.0
xformers==0.0.23.post1
xxhash==3.4.1
yarl==1.9.4
zstandard==0.22.0
tgale96 commented 6 months ago

Hi! Sorry for the delay!

This is strange - I haven't seen it before. The issue seems specific to installing MegaBlocks with a custom Python installation (at least, I haven't seen or heard of this elsewhere).

I've repro'd the issue with your Dockerfile (thanks for that!). I tried a few different things to no avail:

In our setup.py, we import Torch to build the custom extensions. I am not sure why Torch appears to be unavailable when that is executed.

mvpatel2000 commented 6 months ago

In other projects with pyproject.toml, I have seen that build isolation causes a similar issue where setup.py requires torch. Could you try --no-build-isolation and see if that would help?

I'm not completely sure why you would go into build isolation in this case though...

michaelwhitford commented 5 months ago

I added --no-build-isolation and it worked!

(vllm) λ pip install megablocks --no-build-isolation
Collecting megablocks
  Downloading megablocks-0.5.0.tar.gz (47 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 47.7/47.7 KB 1.4 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... done
Collecting stanford-stk>=0.0.6
  Downloading stanford-stk-0.0.6.tar.gz (17 kB)
  Preparing metadata (setup.py) ... done
Requirement already satisfied: triton==2.1.0 in ./venv/lib/python3.10/site-packages (from megablocks) (2.1.0)
Requirement already satisfied: filelock in ./venv/lib/python3.10/site-packages (from triton==2.1.0->megablocks) (3.13.1)
Using legacy 'setup.py install' for megablocks, since package 'wheel' is not installed.
Using legacy 'setup.py install' for stanford-stk, since package 'wheel' is not installed.
Installing collected packages: stanford-stk, megablocks
  Running setup.py install for stanford-stk ... done
  Running setup.py install for megablocks ... done
Successfully installed megablocks-0.5.0 stanford-stk-0.0.6
pip install megablocks --no-build-isolation  42.44s user 10.11s system 113% cpu 46.380 total