Booritas / slideio

BSD 3-Clause "New" or "Revised" License
49 stars 2 forks source link

Improve development and deployment using Containerisation (Docker Images) #23

Open oskarthaeter opened 8 months ago

oskarthaeter commented 8 months ago

Summary:

I am proposing the introduction of Docker containerization to the slideio project to simplify the development and deployment processes, especially regarding dependency management and compiler standardization.

Background:

slideio works flawlessly via the python bindings, which are easily installed. However, setting up the environment for C++ usage and managing dependencies can be challenging due to the library's complex dependencies and the use of specific versions of external libraries. Additionally, the library aims to support a wide range of Linux distributions dating back to 2014, which introduces further complexity in terms of compiler standardization.

Issue:

Complex dependencies and outdated documentation makes it difficult for new users to setup for development of the library. Because of the requirement of C++14, Boost is needed to supply functionality which has been adopted in recent C++ standards (e.g. boost::filesystem), adding an unnecessary dependency.

Proposed Solution:

Implementing Docker containerization for slideio would address these challenges by:

  1. Streamlining the Development Environment: Docker would allow us to create a consistent development environment across different systems. This eliminates the issues arising from dependency management and varying OS/compiler configurations.

  2. Facilitating Adoption of Newer C++ Standards: By containerizing the development environment, we can include specific compiler versions that support newer C++ standards, enabling us to modernize the codebase without worrying about compatibility with older Linux distributions.

  3. Ease of Deployment: Docker containers can simplify the process of deploying slideio, making it more accessible to new users and contributors.

Conclusion:

I am convinced that Docker containerization could be a substantial improvement to the slideio project, both from a development and a user standpoint. It aligns with @Booritas previous mention of the challenges faced due to dependencies and having to support many linux distributions.

Thank you for considering this proposal.

Booritas commented 8 months ago

Thank you for your proposal; it makes a lot of sense. We need to take into account that the following operating systems are supported by slideio:

Providing a Docker container will facilitate the development of the library. Originally, Docker containers were Linux-based. Later, Windows containers appeared. Recently, I've seen news about the availability of OSX containers. I had a bad experience with Windows containers, and I would expect that OSX containers are still not in a good state. Please correct me if I'm wrong. For Linux, we have a broad range of available containers. For creating deployments, I use manylinux2014_x86_64. Unfortunately, the container is not suitable for development due to some missing features that prevent VS Code/CLion from connecting to the container for debugging purposes. Therefore, we can select any Linux container suitable for development, ensuring that the code is compilable by manylinux_2014. It makes sense to create two containers - one for active development (my preference would be Ubuntu) and another for checking with manylinux_2014. Both containers should have Conan installed with all dependencies. This shouldn't involve much effort.

The use of the latest standards is currently restricted by manylinux_2014, which uses GCC 10 toolkit. As far as I know, GCC 10 has good coverage of C++14. In the previous versions of the library, I used manylinux_2010 with GCC 8, which had quite poor support for C++ standards, necessitating the use of Boost.

By the way, the filesystem is a C++17 feature, which is why I haven't used it yet.

So, my questions to you are:

oskarthaeter commented 8 months ago

It looks like manylinux2014 will "reach End of Life (EOL) on June 30th, 2024" as it is based on CentOS7. The newest manylinux_2_28 images are available for x86_64, aarch64, ppc64le and s390x. It is based on GCC 12, which has basically complete coverage of C++20.

For this to work, we should also be compiling the custom dependencies when compiling the source. This would also make your private conan server obsolete. If the compile times become prohibitively long, we may consider a modified docker image where they are already compiled.

In general, it'd be great to get more information on these @slideio/stable dependencies. Why is it necessary to host these specific versions? What makes other versions, hosted on conan center, unsuitable?

oskarthaeter commented 8 months ago

I just realized that continuing support for Windows obviously requires a native Windows environment. I am not familiar with developing for Windows, so unfortunately can't help with that. As for macOS, a well documented dockerfile should suffice for anyone to setup the repo for development.

Booritas commented 8 months ago

It looks like manylinux2014 will "reach End of Life (EOL) on June 30th, 2024" as it is based on CentOS7. The newest manylinux_2_28 images are available for x86_64, aarch64, ppc64le and s390x. It is based on GCC 12, which has basically complete coverage of C++20.

  • The development container should be basically the same as the build image (manylinux_2_28)
  • Windows and macOS(Intel and arm) both support running linux docker containers via Docker Desktop
  • Sure!

For this to work, we should also be compiling the custom dependencies when compiling the source. This would also make your private conan server obsolete. If the compile times become prohibitively long, we may consider a modified docker image where they are already compiled.

In general, it'd be great to get more information on these @slideio/stable dependencies. Why is it necessary to host these specific versions? What makes other versions, hosted on conan center, unsuitable?

It is good to know that multilinux_2014 has reached the end of its life, and there is a better replacement. Let us use manylinux_2_28 for future releases.

We don't have to compile all dependencies for the new container. It would be enough to download them from a Conan server (private or central). You're right; we don't need any Conan server after building such a container.

Regarding slideio@stable dependencies, there is a small set of private packages located in the repository https://github.com/Booritas/conan-recipes that do not exist in Conan-Central or have some modifications to the original sources. Note: not all recipes in the repository are used, and I need to clear it up. The most important one is jxrLib - a recipe for the JXR codec. There were also some conflicts between dependencies that I resolved by making small changes in the build scripts.

Another reason for creating a copy of the packages is that some packages on the central server do not contain builds for the required OS and compiler, especially for the Mac ARM processor. Creating a copy allows compiling the recipe and saving binaries to avoid recompilation multiple times.

The rest of the packages are copies from the central Conan server. JFrog often rebuilds the packages. I've encountered the problem that the same build does not work one week later because a package was recompiled with another GLib. This happened every time I prepared a new version. Fixing such a problem may take a week or two, taking into account that I have my main job and can only work evenings and weekends. Creating the copies saved me a lot of time.

Booritas commented 8 months ago

I just realized that continuing support for Windows obviously requires a native Windows environment. I am not familiar with developing for Windows, so unfortunately can't help with that. As for macOS, a well documented dockerfile should suffice for anyone to setup the repo for development.

Let us focus on Linux; Windows and Mac can come later. I have a Windows environment at home. Actually, it is my preferred environment because of MS Visual Studio. I also have a Mac with an Intel processor. Deployment builds run on CI anyway and use corresponding hardware. We have to be careful with CI for Mac because GitHub restricts CI time for free accounts, and Mac builds have a 30x time-consuming coefficient, while Windows has a 10x coefficient, and Linux - 1x.

oskarthaeter commented 8 months ago

I forked the mentioned recipes repo and started a dockerfile.

Booritas commented 8 months ago

You don't need this repo. You can download compiled files from the canon server. Here is my old code for the manilinux docker file https://github.com/Booritas/docker-manylinux . Now I would do it a bit differently. I would clone slideio repo and run python install.py -a conan . It would install all dependency in the container.

Booritas commented 8 months ago

In other words, I would use beginning of the Dockerfile (updated to the new base):

FROM quay.io/pypa/manylinux2014_x86_64:latest AS base
ARG CONAN_REMOTE
ENV CONAN_REMOTE ${CONAN_REMOTE}
ENV CONAN_REVISIONS_ENABLED 1
RUN yum install -y wget
RUN yum install -y vim
RUN yum install -y gtk2-devel
RUN yum install -y libva-devel
RUN yum install -y soci-sqlite3-devel.x86_64
RUN yum info m4
RUN yum -y update m4
RUN yum info m4
RUN update-alternatives --install /usr/bin/python3 python3 /opt/python/cp38-cp38/bin/python3 10
RUN update-alternatives --install /usr/bin/pip3 pip3 /opt/python/cp38-cp38/bin/pip3 10
RUN yes | pip3 install numpy
# install cmake
RUN yum remove cmake -y
RUN wget -qO- "https://github.com/Kitware/CMake/releases/download/v3.22.3/cmake-3.22.3-linux-x86_64.tar.gz" | tar --strip-components=1 -xz -C /usr/local
RUN update-alternatives --install /usr/bin/cmake cmake /usr/local/bin/cmake 10
RUN update-alternatives --install /usr/bin/ccmake ccmake /usr/local/bin/ccmake 10
#install conan
RUN yes | pip3 install conan
RUN update-alternatives --install /usr/bin/conan conan /opt/python/cp38-cp38/bin/conan 10

Add

clone of the slideio repo
python install.py -a conan
remove slideio directory

This should do the trick.

oskarthaeter commented 8 months ago

I see what you mean, but I'd like to actually compile every dependency specific to slideio, i.e. not pull anything from your slideio/stable conan server but build those dependencies in the image. That not only enables the close inspection of those dependencies but is also necessary to use a compiler of my choosing. Currently these dependencies prevent using some compilers due to ABI mismatches etc.

Booritas commented 8 months ago

You can force conan to rebuild all packages that come from the server. It is very easy to do with a small change in the install.py script. You need to use parameter --build Here modified function in the install.py:

def process_conan_profile(profile, trg_dir, conan_file):
    generator = "cmake_multi"
    command = ['conan','install',
        '-pr',profile,
        '-if',trg_dir,
        '-g', generator,
       '--build'
        ]
    command.append(conan_file)
    print(command)
    subprocess.check_call(command)

In this case, conan will take only recipes from the server and recompile all dependencies. You can as well set compiler of your choice in the conan profile:

[settings]
os=Linux
arch=x86_64
compiler=gcc
compiler.version=10
compiler.libcxx=libstdc++
build_type=Release
[options]
tinyxml2:shared=False
gdal:fPIC=True
[build_requires]
[env]

Conan supports quite a large range of the compilers.

After you run install.py all dependencies will be recompiled with a compiler you set and copied to the local conan repository and ready to use. You don't have to recompile them next time nor you have to pull anything from the server. You can work offline.