alex-petrenko / megaverse

High-throughput simulation platform for Artificial Intelligence reseach
https://www.megaverse.info
MIT License
220 stars 20 forks source link

non-zero exit status 2 while running python setup.py develop (exiting after building target assimp) #27

Open francelico opened 1 year ago

francelico commented 1 year ago

Hello,

I'm having trouble building from source. At first I thought it was similar to #17 , but the build seems to fail a little earlier, right after finishing to build assimp.

I've tried doing the checks @alex-petrenko recommended in #17 and I am getting the expected outputs.

First of all, please make sure that you executed the setup_env script in Vulkan SDK, i.e. following the instruction:

$ cd vulkansdk/1.2.162.0 $ source ./setup-env.sh

After you've done this, check that VULKAN_SDK env variable is set by running this in your terminal:

$ echo $VULKAN_SDK

To make sure the SDK headers contain the correct macros (the build procedure seems to be complaining about VK_ERROR_UNKNOWN), can you please search for it in the SDK folder, i.e. like this:

$ grep -r 'VK_ERROR_UNKNOWN' ./x86_64/include

Expected output is something like:

./x86_64/include/vulkan/vk_enum_string_helper.h: case VK_ERROR_UNKNOWN: ./x86_64/include/vulkan/vk_enum_string_helper.h: return "VK_ERROR_UNKNOWN"; ./x86_64/include/vulkan/vulkan.hpp: eErrorUnknown = VK_ERROR_UNKNOWN, ./x86_64/include/vulkan/vulkan_core.h: VK_ERROR_UNKNOWN = -13,

However even if all the test commands run fine (as well as the ones on the vulkand sdk website), I have noticed that the vulkan-sdk is supported for Ubuntu 20.04 and over. I am running Ubuntu 18.04.

Full output of the build command: setup_outputs.txt

francelico commented 1 year ago

Other things I've tried:

This is the modified Dockerfile.base I ran:

FROM nvidia/cudagl:10.2-devel-ubuntu18.04
# FROM nvidia/cudagl:11.0-devel-ubuntu20.04
# FROM nvidia/cuda:10.2-cudnn7-devel-ubuntu18.04

# Set up locale to prevent bugs with encoding
ENV LC_ALL=C.UTF-8 LANG=C.UTF-8

RUN apt-get update || true && apt-get install -y \
    libcudnn8 \
    libglvnd0 libgl1 libglx0 libegl1 \
    libglvnd-dev libgl1-mesa-dev libegl1-mesa-dev \
    wget curl git zlib1g-dev \
    libglib2.0-0 libsm6 libxext6 libxrender-dev \
    python3 python3-pip cmake \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

WORKDIR /vulkan
COPY vulkansdk-linux-x86_64-1.2.198.1.tar.gz .
RUN tar -xzf vulkansdk-linux-x86_64-1.2.198.1.tar.gz

ENV VULKAN_SDK=/vulkan/1.2.198.1/x86_64
ENV PATH=${VULKAN_SDK}/bin:${PATH}
ENV LD_LIBRARY_PATH=${VULKAN_SDK}/lib:${LD_LIBRARY_PATH:-}
ENV VK_LAYER_PATH=${VULKAN_SDK}/etc/vulkan/explicit_layer.d

RUN vulkaninfo

WORKDIR /

RUN cd /usr/bin \
    && ln -s python3 python \
    && ln -s pip3 pip

RUN curl -LO http://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh && \
    bash Miniconda3-latest-Linux-x86_64.sh -p /miniconda -b && \
    rm Miniconda3-latest-Linux-x86_64.sh

ENV PATH=/miniconda/bin:${PATH}
RUN conda update -y conda && conda --version

WORKDIR /workspace

RUN git clone https://github.com/alex-petrenko/sample-factory.git
RUN git clone https://github.com/alex-petrenko/megaverse.git
WORKDIR /workspace/megaverse

RUN conda env create -f environment.yml
RUN conda init bash
SHELL ["conda", "run", "-n", "megaverse", "/bin/bash", "-c"]

RUN git submodule update --init --recursive
RUN python setup.py develop
RUN pip install -e .
alex-petrenko commented 1 year ago

Hi @francelico ! Do you think this is Ubuntu 18-specific? I.e. it works on 20 but fails in 18?

It would be very helpful if you could run again with make -j1 to do the build in a single process. Because at this point it's not actually clear what's failing. Like I don't see a clear compilation or linking error in the logs.

alex-petrenko commented 1 year ago

You can do it by adding arguments -j and 1 here https://github.com/alex-petrenko/megaverse/blob/c38436d69b7a2b6d77ed9de06f6a5e69251696e3/setup.py#L105

You can also manually run in the terminal from the root of the project:

mkdir build
cd build
cmake ..
make -j1
francelico commented 1 year ago

Hi @alex-petrenko ,

I don't think this is distribution specific as I have the same error on a clean install of 22, on a separate machine. I didn't try 20 yet though, should I?

Log output with -j1 option, on a clean Ubuntu 22.04.2 install out.log

alex-petrenko commented 1 year ago

@francelico sorry maybe I'm missing something but the log looks like it just abruptly ends. I don't see any compilation or linking errors. Does the build just stop here? Can you make sure you log both stdout and stderr, without seeing the error it is hard to understand what is happening.

francelico commented 1 year ago

Hi @alex-petrenko , apologies for the delayed reply. Indeed I double checked and I was only logging stdout in my original file. I'm attaching the full log with both stdout and stderr.

out.log

alex-petrenko commented 1 year ago

@francelico frustrating as it might be, I still have no idea what the actual error is. I can't find a concrete linking or compiling error in the logs. I can indeed see that the build is failing (with the error code 2) but it's impossible to figure out why without seeing the actual compilation output.

There's a million warnings from different libraries we include, but no errors to be seen.

I recommend that you try a standalone build, i.e. just regular cmake and then make, without the Python libraries first. You can pass the option -j1 to make to make sure it builds single-thread and stops immediately when the compiler fails. You can then just copy-paste the error from the terminal.

Building C++ programs from sources can be frustrating like this, I guess this is one reason everyone uses python for everything. But hey, you can't write a million FPS renderer in Python :)

francelico commented 1 year ago

@alex-petrenko indeed, that huge boost in FPS alone makes it worth it to fight a bit for it :)

Building from the command line fails much earlier and throws a different compilation error (I am building from the src/ directory as there is no CMakeList.txt at the root). Attaching the log

out.log