CCInc / 3d-ml

A versatile framework for 3D machine learning built on Pytorch Lightning and Hydra [looking for contributors!]
15 stars 3 forks source link

Installation issues #34

Closed aaronfderybel closed 1 year ago

aaronfderybel commented 1 year ago

Hi @CCInc ,

I've tried installing this repo and occured an error while running ./install_openpoints.sh.

My versions of software components: Using pip in virtual environment. Python version: Python 3.10.8 pip version: pip 22.3.1 nvcc version: Cuda compilation tools, release 11.6, V11.6.124 Build cuda_11.6.r11.6/compiler.31057947_0 gcc version: 7.5. OS info: linux version: Description: Ubuntu 18.04.6 LTS Release: 18.04 Codename: bionic kernel version: 5.4.0-125-generic

commands ran:

#add recursive comment otherwise openpoints folder content is not included (because it's a submodule)
git clone https://github.com/CCInc/3d-ml.git --recursive

#create virtual env with python, go inside it.
cd 3d-ml
python -m virtualenv env_3d
source env_3d/bin/activate

#install pytorch with pip
pip3 install torch torchvision --extra-index-url https://download.pytorch.org/whl/cu116
#install pytorch geo with pip, ${CUDA} = cu116
pip install pyg-lib torch-scatter torch-sparse -f https://data.pyg.org/whl/torch-1.13.0+cu116.html
pip install torch-geometric

#install additional requirements
pip install -r requirements.txt

#install openpoints as root
sudo ./install_openpoints.sh

I receive the following warnings

cuda/emd_kernel.cu(178): error: identifier "CHECK_EQ" is undefined

cuda/emd_kernel.cu(265): error: identifier "CHECK_EQ" is undefined

cuda/emd_kernel.cu(382): error: identifier "CHECK_EQ" is undefined

I also receive some warnings:

.local/lib/python3.10/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.

.local/lib/python3.10/site-packages/setuptools/command/easy_install.py:160: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.

What is the recommended way to install this repo if not using conda? Also noticed that python <= 3.8 is not supported.

leo-stan commented 1 year ago
  1. Have you tried running the test training even with the warnings? If so where does it fail?
  2. Can you tell us which command exactly gives the warning?
  3. Any chance you could try installing using Conda as its the preferred way of installation at this point? Just to make sure that works.

Thanks :)

aaronfderybel commented 1 year ago

1.Have you tried running the test training even with the warnings? If so where does it fail? running python src/train.py model=cls_pointnet++ data=cls_modelnet2048 gives multiple errors. I attached the stacktrace stacktrace.txt

2. Can you tell us which command exactly gives the warning? sudo ./install_openpoints.sh I receive the warnings and errors above from the openpoint shell script

3. Any chance you could try installing using Conda as its the preferred way of installation at this point? Just to make sure that works. Currently have some other things on my local device. Installing conda could break some of my other things on there, so I would rather not. I will look into this if there is no easy fix or badly supported through pip.

CCInc commented 1 year ago

Hi @aaronfderybel !

I think this is an OpenPoints issue. I'll try to open a PR shortly for it and let you know.

Installation with pip should be no problem, if it ends up working for you maybe you can add a howto in the docs?

CCInc commented 1 year ago

@aaronfderybel I was able to reproduce it with the following docker image:

FROM nvidia/cuda:11.6.1-devel-ubuntu20.04

RUN apt-get update && apt-get -y install git wget

# Install miniconda3

WORKDIR /root
ENV MINICONDA3 /root/miniconda3
RUN mkdir -p $MINICONDA3 \
    echo "I'm building for TARGETPLATFORM=${TARGETPLATFORM}" \
    && case ${TARGETPLATFORM} in \
         "linux/arm64")  MINI_ARCH=aarch64  ;; \
         *) MINI_ARCH=x86_64  ;; \
    esac \
    && wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-${MINI_ARCH}.sh -O $MINICONDA3/miniconda.sh \
    && chmod +x $MINICONDA3/miniconda.sh \
    && $MINICONDA3/miniconda.sh -b -u -p $MINICONDA3 \
    && rm -rf $MINICONDA3/miniconda.sh
ENV PATH="/root/miniconda3/bin:${PATH}"

RUN conda init bash \
    && conda install python=3.9

ENV TORCH_CUDA_ARCH_LIST "7.5 8.0 8.6"

#install pytorch with pip
RUN pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/cu116
#install pytorch geo with pip, ${CUDA} = cu116
RUN pip install pyg-lib torch-scatter torch-sparse -f https://data.pyg.org/whl/torch-1.13.0+cu116.html
RUN pip install torch-geometric

COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt

COPY . .
RUN ./install_openpoints.sh
# RUN cd openpoints/cpp/emd && python setup.py install --user

Can you try this and let me know if it works? https://github.com/guochengqian/openpoints/pull/9

aaronfderybel commented 1 year ago

Hello @CCInc ,

Installation with pip should be no problem, if it ends up working for you maybe you can add a howto in the docs? => No problem once it works on my machine I can provide this.

I've made the adaptations to openpoints according to https://github.com/guochengqian/openpoints/pull/9 This solved the error during installation with ./install_openpoints.sh

However when i run the basic example: bash python src/train.py model=cls_pointnet++ data=cls_modelnet2048 I receive a bunch of import errors according to openpoints related modules

AttributeError: module 'src.models.classification' has no attribute 'openpoints_module'
ModuleNotFoundError: No module named 'chamfer'
Are you sure that 'openpoints_module' is importable from module 'src.models.classification'?

I'm working using a virtual environment and python 3.9. Maybe I should add some extra things in my PATH variable to be able to load these modules? Or what do you think the problem is here?

leo-stan commented 1 year ago

Yeah that's the typical error we get when openpoints install is not successful.

  1. Can you run import src.models.classification.openpoints_module from within your virtual env and tell us if that works?
  2. What model of GPU are you using?
aaronfderybel commented 1 year ago

I've managed to make it work on my machine. Problem was that some of the modules were not installed in virtualenvironment but outside of it in my main python install.

I've adapted the file ./install_openpoints.sh by removing the --user argument. The chamfer and other modules are visible in pip list now and can be imported

I have two GeForce RTX 2080 Ti, by default one is used. I'm able to run the basic example now by setting num_workers:12 and batch_size:32 (defaults are 15 workers and batch size of 64, this makes my GPU run out of memory).

Installation with pip should be no problem, if it ends up working for you maybe you can add a howto in the docs? How would you like me to do this? Add a section in the README file through a pull request?

CCInc commented 1 year ago

Great, glad to hear it's working!

If you can open a PR that would be great. Maybe title the existing installation procedure with "Conda Installation (recommended)" and then add another subsection with "Pip Installation" and adapt the steps as necessary.

Feel free to also include the fixes you had to do to the install_openpoints.sh file in there as well. Thanks!

aaronfderybel commented 1 year ago

Hi @CCInc ,

I see that the version of the submodule of openpoints in the main branch does not contain the fix from https://github.com/guochengqian/openpoints/pull/9 Does this fix need to be included in the openpoints main branch or does it break other things?

CCInc commented 1 year ago

Hi @aaronfderybel, I updated it in #40 , should be all good now!