FLAMEGPU / FLAMEGPU2

FLAME GPU 2 is a GPU accelerated agent based modelling framework for CUDA C++ and Python
https://flamegpu.com
MIT License
106 stars 21 forks source link

Python Binary Distribution with multiple configurations #605

Open ptheywood opened 3 years ago

ptheywood commented 3 years ago

The initial python binary distributioin(s) will be / are Release mode, seatbelts ON only.

This is the default, sensible choice for python users to be able to develop models that run reasonably well (i.e. not terribly slow debug builds) while also getting sane error messages if there are issues with agent functions. If python users require faster builds, they will have to build from source initially.

Longer term, it would be nice to provide pre-built binaries for seatbelts=OFF, or to support 2 major cuda versions, or other options that result in a larger build matrix. Doing this in a wheel compliant way is non-trivial.

Python namespace packages are a potential solution, or multiple packages which users can import with alternate names (i.e. pyflamegpu-seatbeltsoff, which users can then import as pyflamegpu for a 1 line change)

ptheywood commented 3 years ago

Using cibuildwheel might be a reasonable way to build pypi compliant wheels? Would need investigation.

https://pypi.org/project/cibuildwheel/

ptheywood commented 3 years ago

This is also relevant to the CUDA version used to build the wheel.

The 2.0.0-alpha wheels are build using CUDA 11.0, as this is the available version on google colab at this time. This adds a runtime dependency on libnvrtc.so11.0, so if you do not have 11.0 installed, you will get an error:

 ImportError: libnvrtc.so.11.0: cannot open shared object file: No such file or directory

This is to be expected.

If we build with 11.2 instead, this would be compatible with CUDA 11.2+ (but presumably not 12.x) due to changes in how NVRTC is packaged from CUDA 11.3. See this nvidia blog post

In CUDA toolkits prior to CUDA 11.3, the SONAME value was in the form MAJOR.MINOR and the DLL filename was in the form nvrtc64_XY_0.dll, where X=MAJOR, Y=MINOR. Starting from CUDA 11.3, and for all future CUDA 11.x toolkit releases, the NVRTC shared library version will not change and will be frozen at 11.2. The SONAME in the Linux version of the library is 11.2 and the corresponding DLL filename in Windows is nvrtc64_112_0.dll.

From the next major CUDA release onwards, X (which will be greater than 11), the NVRTC shared library’s SONAME and its DLL filename equivalent will only encode the CUDA major version. On Linux, the SONAME will be X and on Windows the DLL filename will be nvrtc64_X0_0.dll, where X is the major version.

We can potentially provide 11.0 wheels for colab, and 11.2 wheels for more recent cuda support (with tweaks to reduce the build matrix potentially?). Again this would lead to more breakage of wheel filename conventions to differentiate the different CUDA versions (and dependencies on external .so's are breakages anyway))

ptheywood commented 3 years ago

Rather than renaming wheels pyflamegpu to pyflamegpu-console etc manually, it appears that this might actually be possible via setuptools, while the python package itself is still called pyflamegpu.

I.e.

setuptools.setup(name="pyflamegpu-console", packages=["pyflamegu"])

In this case, the generated wheel would be pyflamegpu-console-<stuff>.whl.

If we were uploading these to pip, it woudl be separate package, available as pyflamegpu-console while the main pyflamegpu package would have vis on?. The wheels would still be out-of-spec however due to the libcuda.so dependency preventing them from being truly conforming manylinux wheels.

Package extras may be another option. e.g. pyflamegpu would be the console build, and pyflamegpu[visusliation] would also install the visusaltiion components. This would need changing to how the VISUALISATION macro works in C++, and might also need swig splitting into modules, I'm not sure. The general use of extras seems to be that the main package is minimal, with extras being opt in. This would work for vis if we resturcture it, but I don't think it'll work for seatbetls on/off. I am leaning towards console only being default, and vis be an opt-in extra, as vis is not always required and doesn't exist for ensembles.

So I if we do go down the multiple packages route pyflamegpu is console mode, and pyflamegpu-visusalition is with vis. It might be possible to combine these approaches (but I need to read more / try things).

Multiple cuda versions might be more challenging / doesn't fit as an extra when you cannot specify a default or make extras mutually exclusive. We need to provide CUDA 11.0 for google collab, but I'd much prefer that 11.2 become the main / default option, as this will work with all future 11.X relases due to changes in cuda shared object versioning / compatability going forwards. When CUDA 12 turns up we would then want to also provide CUDA 12 images, and probably maintain support / pre-built binaries for multiple cuda versions?

I do still (potentially) prefer conda for distribution for labels etc, but still need to read more on how conda recipes work, and how binary distribution actually works w.r.t older glibc etc. We don't want users having to do source builds of pyflamegpu given how long the pyflamegpu target takes to compile. Conda provide packages for the cuda toolkit, so conda install pyflamegpu cudatoolkit=11.2 or similar would either build pyflamegpu with that version, or hopeuflly download a matching binary distribution. I'm not 100% on this though still.

ptheywood commented 3 years ago

From the CUDA Tooklit EULA nvrtc.so, licurand[_static].so, libcudart[_static].so etc and their dll equivalents are redistributable and can be included in wheels / docker images, but the driver libraries (libcuda.so,libnvidia-ptxjitcompiler.so, there is no equivalent windows .dll?) are not:

The NVIDIA CUDA Driver Libraries are only distributable in applications that meet this criteria: The application was developed starting from a NVIDIA CUDA container obtained from Docker Hub or the NVIDIA GPU Cloud, and The resulting application is packaged as a Docker container and distributed to users on Docker Hub or the NVIDIA GPU Cloud only.

In the manylinux container, i'm only installing the stub library of libcuda.so, as the driver is not being distributed anyway? Need to investigate more (still)

ptheywood commented 3 years ago

Pytorch appear to use conda as the primary distribution now based on thier get-started page which has a nice table for getting installation instructions.

to provide multiple CUDA version support via pip, they provide CUDA 10.2 releases to pypi directly, and cuda 11.1 is available via piip's -f/--find-links <url> argument and a local package version of +cu111.

I.e. pip3 install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html

We could potentially do something similar by generating a html page on the website which points to the various wheels which are attached to releases through some funky github api usage. Shouldn't be too difficult to do (possibly as a separate gh pages repo whl , hosted at download.flamegpu.com which would provide download.flamegpu.com/whl/flamegpu.html for non-prerelease builds and download.flamegpu.com/whl/flamegpu_prerelease.html for pre-releases). Then using +cuda11.1 or similar as a local build number, pointing to the github release artifact. An action on publication of a release could trigger an update to the gh-pages branch of the other repo. pip would require the --pre argument for pre-release builds if the prerelease.html were provided.

Not sure this helps with seatbelts on/off still, the <version>+<local> doesn't really leave room for multiple options? otherwise it'd end up being +cuda111+seatbeltsoff or +seatbeltsoff or +cuda111 and order would be very important / easy to get wrong.

Conda might still be cleaner / easier though as an alternate.

ptheywood commented 3 years ago

The CUDA 11.4 early access to official cuda python bindings don't look like they will be of any use to us, as all our cuda is handelled within the cuda-c++ library. It will be available as a package on pypi and conda in the future, so might be able to become a dependency for bettter package management, but that's unclear at this stage.

ptheywood commented 3 years ago

Note that there is a strong convention to name a project after the name of the package that is imported to run that project. However, this doesn’t have to hold true. It’s possible to install a distribution from the project ‘foo’ and have it provide a package importable only as ‘bar’.

https://packaging.python.org/glossary/#term-Project

I.e. it's convention that pip install pyflamegpu would provide the python package pyflamegpu, but providing a separate pyfalmegpu-vis or pyflamegpu-fast that provides pyflamepu as well wouldn't strictly be incorrect (just would make those packages mutually exclusive in a given python environment)

ptheywood commented 3 years ago

Regarding the use of python local version identifiers (and separate pip sources) such as the +cu111 use of pytorch:

PEP440 specifies that:

Local version identifiers MUST comply with the following scheme:

<public version identifier>[+<local version label>]

... Local version labels have no specific semantics assigned, but some syntactic restrictions are imposed.

Local version identifiers are used to denote fully API (and, if applicable, ABI) compatible patched versions of upstream projects. For example, these may be created by application developers and system integrators by applying specific backported bug fixes when upgrading to a new upstream release would be too disruptive to the application or other integrated system (such as a Linux distribution).

...

To ensure local version identifiers can be readily incorporated as part of filenames and URLs, and to avoid formatting inconsistencies in hexadecimal hash representations, local version labels MUST be limited to the following set of permitted characters:

  • ASCII letters ([a-zA-Z])
  • ASCII digits ([0-9])
  • periods (.)

I.e. downstream projects can redistribute builds with local versions, but they must be api compatibly. torch are essentially providing their own "downstream" builds, which are api compatible with the "upstream" pypi distribution.

The cuda local version is cu111 because of the rules on which characters are allowed in the version (i.e. no dashes) and the special meaning of . when comparing local versions.

Comparison and ordering of local versions considers each segment of the local version (divided by a .) separately. If a segment consists entirely of ASCII digits then that section should be considered an integer for comparison purposes and if a segment contains any ASCII letters then that segment is compared lexicographically with case insensitivity. When comparing a numeric and lexicographic segment, the numeric section always compares as greater than the lexicographic segment. Additionally a local version with a great number of segments will always compare as greater than a local version with fewer segments, as long as the shorter local version's segments match the beginning of the longer local version's segments exactly.

Personally, i'd prefer cuda111, and this scheme could very easilly break if cuda 11.X lasts until 11.10, although that is unlikely given the ~ 3 monthly minor cuda release schedule, and new architectures being released every 2 years or so (and there's no precendent for this).

So using local builds for cuda versions seems very appropriate, but pypi versions shouldn't/can't have a local version number, so for a given pypi package only a single (based) cuda version can be supported this way.

Personally, i'd be keen for this to be 11.2 for forwards compatibility, with +cuda110 being available from ourselves as an alterantive for collab. Once 12. turns up, we would then also provide a 120 build ourselves.

We could also potentially include the minimum SM that the build supports in there as a separate point-separated section, i.e. cuda110.sm35 and cuda110.sm60. This would have weird version comparison semantics though, with sm60 being "newer" than sm35, and I think that might mean if you just asked for cuda110 you'd get the one with the least support...

However, using local builds for SEATBELTS and VISUALISATION doesn't make sense, as these impact the API (although the api SEATBELTS changes might not actually be exposed via swig, with the only difference being the value of pyflamegpu.SEATBELTS, but i'm not 100% on this).

For now, I think i'm going to go with:

mondus commented 1 year ago

Colab has now moved to 11.2

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_14_21:12:58_PST_2021
Cuda compilation tools, release 11.2, V11.2.152
Build cuda_11.2.r11.2/compiler.29618528_0
ptheywood commented 1 year ago

We still probably want to have 11.0 dists as our minimum supported version, then 11.2 which is generally abi compat with other 11.x releases (some wiggle room till 11.3 iirc in some libs / nvrtc).

We would have to do 11.8 as well if we want to provide a SM 90 (Hopper) pre-compiled release, though we do embed ptx so it'll be usable on those devices anyway via JIT (just not leverage SM90 opts). Each new CUDA does bloat the release matrix even more.

The main concern in this issue is the separation of vis and non-vis builds. A (painful for use, nice for eveyrone once done) option is to always do vis builds, but dlopen the vis component(s) if they are appropriate instead (and invert the vis --console flag to be --visualiser instead). But that's a load of faff as we'd have to stub everything (this must be a common enough issue that someone has written a decent tool for generating stubs that i've just failed to find in the past). If we did that however, all binary dists would be vis, halving the matrix size and solving our problematic wheels with different names that install the same package.

cuda version etc is handeld similar to other packages via the + version info which is fine if not ideal. Cmake migth change that (and might need a bigger release matrix, given users can prolly specify the cuda they want...)

ptheywood commented 1 year ago

The current renaming of the visualisaiton wheels to pyflamegpu_vis via CMake not only cuases issues when installing both wheels, but also causes the test_version.py tests to fail for visualistaion builds (when using python >= 3.8) due to the metadata search for the package named pyflamegpu not finding anything (as it's named pyflamegpu_vis, but the module is still pyflamegpu).

If we move the vis/notvis status into the local build version (like the cuda version) that would solve both those issues, but making installing the vis version less nice. Its probably the best short term option, given we aren't immediately implementing conda packages or dlopening the visualsiation and only shipping vis wheels.

ptheywood commented 1 year ago

From 2.0.0rc1, we will have cuda 11.2 and 12.0 wheels being built on CI, and available as non-manylinux compliant wheels available from the github releases page, and via whl.pyflamegpu.com.

Vis and non-vis builds available.

Local python package version is used to differentiate the builds, which is not pypi compatible (i.e. we could only provide a single one on pypi).

Users can either very explicilty request something like pyflamegpu==2.0.0rc0+cuda112.vis, from whl.flamegpu.com, or can use one of the more specific index files such as whl.flamegpu.com/whl/cuda112. If they just ask for pyflamegpu though, due to the ordering rules of local versions +cuda112.vis is greater than +cuda112, so vis builds are the default from the primary index.

Still no belts off prebuilt python pacakges available, although this could be something we add to the wheelhouse if we want.

Conda is probably still the better choice, at very least it makes multiple cuda versions easier to select between. Lookslike 12.x might need different handling than 11.x though (can only depend on the parts of cuda needed, making installs lighter).