RenderKit / oidn

Intel® Open Image Denoise library
https://www.openimagedenoise.org/
Apache License 2.0
1.77k stars 164 forks source link

SSE4.1 causes crashes on older CPU #76

Closed frankspace closed 4 years ago

frankspace commented 4 years ago

In version 1.2.0, it appears that SSE4.1 is forcibly enabled and therefore causes a crash on CPUs that don't support it. In Issue #43 it was indicated that this should be auto-detected, but that doesn't seem to be the case. Please see this Arch Linux bug report for more details. Apparently, however, commit #7d4f9d8 results in SSE4.1 being manually set. Is there a way to go back to SSE4.1 being auto-detected instead? (Full disclosure: I'm not a programmer so I don't fully understand any of this.) Thank you!

atafra commented 4 years ago

Thanks for reporting the issue but i'm honestly quite puzzled by it. Version 1.2.0 certainly did not forcibly enable SSE4.1, that commit you mentioned was misinterpreted. Some source files are indeed always compiled with SSE4.1 but not all of them, and this is intentional. SSE4.1 is still autodetected and the SSE4.1 code should never be executed if the CPU doesn't support it, as it's behind an instruction set support check. However, the library has always required SSE4.1 to actually run (as stated in the documentation), so there's no point creating multiple package variants for it because a simple compile flag change will not make it work with older instruction sets. Issue #43 is still a mystery: we can't reproduce the issue and we didn't receive additional information so we closed it.

So the question is why does Blender crash when OIDN is being initialized despite the SSE4.1 check. I just tried to reproduce the issue outside Blender with the official OIDN binaries but it works as expected: OIDN reports an error that SSE4.1 is not supported and no crash happens. Could you please confirm this on your machine by running the tests example that ships with OIDN (it won't print a specific error but it shouldn't crash either)?

I'm wondering whether this crash is specific to the Arch packages of Blender and OIDN. Do you know whether OIDN was modified in any way for Arch (e.g. build flags)? Also, could you please describe steps for reproducing this issue from scratch? Thanks!

frankspace commented 4 years ago

Bafflingly, although I managed to build OIDN once using Arch's build script and chroot, I cannot do so again. (I have absolutely no clue whatsoever why it worked once.) I tried using Arch's current ninja-based build script, its previous more traditionalish cmake build script, and just downloading the official source tarball and following the instructions on the main github page. In every instance, compilation eventually failed in the same place:

/build/openimagedenoise/src/oidn/core/math.ih:10:18: Error: Redefining 
        uint8/uint16/uint32/uint64 type which is part of ISPC language since 
        version 1.13. Remove this typedef or use ISPC_UINT_IS_DEFINED to 
        detect that these types are defined. 
typedef unsigned int8  uint8;
                 ^^^^

/build/openimagedenoise/src/oidn/core/math.ih:11:18: Error: Redefining 
        uint8/uint16/uint32/uint64 type which is part of ISPC language since 
        version 1.13. Remove this typedef or use ISPC_UINT_IS_DEFINED to 
        detect that these types are defined. 
typedef unsigned int16 uint16;
                 ^^^^^

/build/openimagedenoise/src/oidn/core/math.ih:12:18: Error: Redefining 
        uint8/uint16/uint32/uint64 type which is part of ISPC language since 
        version 1.13. Remove this typedef or use ISPC_UINT_IS_DEFINED to 
        detect that these types are defined. 
typedef unsigned int32 uint32;
                 ^^^^^

/build/openimagedenoise/src/oidn/core/math.ih:13:18: Error: Redefining 
        uint8/uint16/uint32/uint64 type which is part of ISPC language since 
        version 1.13. Remove this typedef or use ISPC_UINT_IS_DEFINED to 
        detect that these types are defined. 
typedef unsigned int64 uint64;
                 ^^^^^

None of that makes any sense to me, but one way or the other, OIDN doesn't seem to build on my box. I have to admit, my understanding is that Arch's chroot environment should prevent any weird build flags from being introduced, but my confidence in that understanding is not that great.

However, more immediately relevantly, I built a script to install the official binaries as a "openimagedenoise-bin" package. And with that installed... Blender works perfectly. And as a test, I tried installing Arch's official package just to see if anything changed, and nope, Blender crashes immediately. So... I will update my bug report over there. Thank you!

FFY00 commented 4 years ago

My initial assessment when I was searching for the error was that SSE 4.1 was being forced. Which I think it's fair since my lack of context about the commit. When I was updating the package I noticed that this might not be the case, so I didn't do it. I was meaning to look into this properly but other stuff came and I hadn't been able to come back to the issue.

The package should be reproducible. It seems ispc got updated to 1.13 and is not playing well with the latest openimagedenoise release. Backporting https://github.com/OpenImageDenoise/oidn/commit/e321d7c90a2c706a521a3afd8913af36b121dc9e makes the build work again. I just tested and was able to build the exact same package, building the package multiple times produces the same checksum.

==> comparing artifacts...
  -> Package 'openimagedenoise-1.2.0-2-x86_64.pkg.tar.zst' successfully reproduced!

Our build flags are the following

CPPFLAGS='-D_FORTIFY_SOURCE=2'
CFLAGS='-march=x86-64 -mtune=generic -O2 -pipe -fno-plt'
CXXFLAGS='-march=x86-64 -mtune=generic -O2 -pipe -fno-plt'
LDFLAGS='-Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now'

Here's the fixed PKGBUILD. @atafra this might also be useful to you, to see how exactly we build the project.

# Maintainer: Filipe Laíns (FFY00) <lains@archlinux.org>
# Maintainer: Sven-Hendrik Haase <svenstaro@gmail.com>

pkgname=openimagedenoise
_pkgname=oidn
pkgver=1.2.0
pkgrel=2
pkgdesc='Intel(R) Open Image Denoise library'
arch=('x86_64')
url='https://openimagedenoise.github.io'
license=('Apache')
depends=('intel-tbb' 'python')
makedepends=('git' 'cmake' 'ninja' 'ispc')
source=("git+https://github.com/OpenImageDenoise/oidn#tag=v$pkgver"
        'git+https://github.com/OpenImageDenoise/oidn-weights'
        "oidn-mkl-dnn::git+https://github.com/OpenImageDenoise/mkl-dnn"
        'fix-ispc.patch::https://github.com/OpenImageDenoise/oidn/commit/e321d7c90a2c706a521a3afd8913af36b121dc9e.patch')
sha512sums=('SKIP'
            'SKIP'
            'SKIP'
            'b03916098e771fb0467d32d60aa687c804da1d184956b392489e5943c8d71e439c05d05d809fc8e616a45a1cdcc9425cd29e73bd2222f4b6a2093c3eb09fadb2')

prepare() {
    cd $_pkgname

    git submodule init
    git config submodule.weights.url "$srcdir"/oidn-weights
    git config submodule.mkl-dnn.url "$srcdir"/oidn-mkl-dnn
    git submodule update

    patch -p1 -i ../fix-ispc.patch
}

build() {
    cd $_pkgname

    cmake \
        -B build \
        -G Ninja \
        -DCMAKE_INSTALL_PREFIX=/usr \
        -DCMAKE_BUILD_TYPE=Release
    ninja -C build
}

package() {
    cd $_pkgname
    DESTDIR="$pkgdir" ninja -C build install
    rm "$pkgdir/usr/bin/tests"
}

The package should be built with extra-x86_64-build.

@atafra could you see if you can reproduce using the same flags as us?

cc @svenstaro

frankspace commented 4 years ago

I'm sorry, I did not mean to imply any negligence on anyone's part!

I had previously been attempting to compile this using extra-x86_64-build.

I tested the updated PKGBUILD posted above. It compiled successfully using extra-x86_64-build -c. Unfortunately, when I install it, Blender goes back to crashing for me with the same error reported above.

I don't know if this is at all helpful, but I ldd'd the official binary libOpenImageDenoise.so.1.2.0 with the one generated by the above PKGBUILD, and the official binary has a dependency on libtbbmalloc.so.2 not present in the one built by the PKGBUILD. I'm not really a programmer so I have no idea if that's relevant, but I thought I'd offer the data-point in case it is.

FFY00 commented 4 years ago

libtbbmalloc.so.2 is provided by intel-tbb, in 1.2.0 @svenstaro changed the PKGBUILD to vendor TBB, I am not sure why. You can try building without a vendored TBB and see if it works.

atafra commented 4 years ago

Thanks @frankspace and @FFY00 for all the details! If Blender doesn't crash with the official binaries it seems there's an issue with the Arch packaging but I don't have a clue what could that be yet. I don't think it's TBB related.

@FFY00, is it standard practice to override the default build flags? The flags you copied shouldn't cause a crash but I don't recommend it as it will at least cause lower performance. We use different flags (and instruction sets) for different files for optimal performance without breaking compatibility with older architectures. Since the flags are different from the official build, this might be an issue.

I would like to reproduce the crash but I've never used Arch before so could you please provide some brief steps for building everything? We're going to release v1.2.1 by the end of this week, which will include the fix for ISPC 1.13, and I would like to make sure that this crash is fixed too.

FFY00 commented 4 years ago

@FFY00, is it standard practice to override the default build flags?

Yes, the most relevant reason is to ensure we we have PIE and relro.

it will at least cause lower performance.

I assume you are talking about using -O2 over -O3 as this is the only performance-related option. That is not really correct, it is a very common misconception of -O3. -O3 enables all optimization, but not all optimizations are desirable depending on the use-case. Optimizations are a tradeoff, you gain performance in one place, you lose in another. -O2 provides a curated list of optimizations, which was selected to provide the most performance in the general use-case. Unless you ran benchmarks comparing -O2 and -O3, -O2 is the safest bet.

We use different flags (and instruction sets) for different files for optimal performance without breaking compatibility with older architectures. Since the flags are different from the official build, this might be an issue.

That shouldn't be a problem, we don't overwrite anything, we just provide a default.

I would like to reproduce the crash but I've never used Arch before so could you please provide some brief steps for building everything?

Basically you just install devtools on an arch machine and run extra-x86_64-build on a directory with the target PKGBUILD. extra-x86_64-build will build the package in a clean chroot (actually, a systemd-nspawn container). Let me know if you have any trouble trying to setup an environment, feel free to contact me by email, irc (FFY00 on freenode), etc.

atafra commented 4 years ago

I assume you are talking about using -O2 over -O3 as this is the only performance-related option.

No, I meant primarily the SSE4.1 optimization flag which does have a performance impact. In the future we might add AVX2 and AVX-512 flags to some files, so removing these flags will cause an even bigger slowdown.

That shouldn't be a problem, we don't overwrite anything, we just provide a default.

Oh, so if the project adds ISA flags, those will be preserved? In that case, this should be perfectly fine.

Basically you just install devtools on an arch machine and run extra-x86_64-build on a directory with the target PKGBUILD.

How do I build specifically Open Image Denoise and Blender this way? Thanks!

frankspace commented 4 years ago

Sorry, for the delayed response, I got my bug reports mixed up!

I'm probably not the best person to ask about how the compilation works under the hood. However, the build process basically works like this:

Arch (and derivatives) provide a build recipe called a "PKGBUILD." Fortunately, Arch's implementation of OIDN doesn't depend on any other provided files (patches, etc), so you can just download that file from here. Ordinarily, you can just plonk that file into a directory wherever you want to build your package, type "makepkg," and Arch's build tools will run through that recipe, automatically downloading sources, extracting them into a working directory, building, installing, etc. And usually that works fine. However, some people like to tamper with their build flags for various reasons (CPU-specific optimization, for example), so there's also the option of building in a clean chroot. That, however, requires Arch's devtools package, which depends on systemd, which could be a problem if, like me, you want to move to a different init system. But, if you do have systemd on your box, the extra-x86_64-build command automatically sets up a clean build environment container with Arch's default build flags and with only the minimal dependencies installed, and then runs makepkg inside that context.

Alas, although I can more or less describe what happens, I really do not have any meaningful understanding of how it works under the hood. Complicating things, and not very clear from the wiki, is that PKGBUILD recipes also have some options, like whether to strip debugging symbols, leave static libraries, etc. The OIDN PKGBUILD doesn't appear to deviate from the defaults, which are listed about halfway down the page here and actually explained about halfway down the page here. Again, I don't fully understand all of that, nor do I understand why Arch isn't downloading the source tarball or is using ninja, but I trust they have good reasons and the explanation would almost certainly be beyond my comprehension as a non-programmer. As noted, I couldn't get it to compile using the official instructions anyway.

Sorry I can't help further. I am probably going to create an "openimagedenoise-bin" package in the Arch User Repository in the meantime, which would just repackage the official binaries (which work fine as a drop-in replacement).

jessey-git commented 4 years ago

Note that blender officially is still on OIDN 1.0. The upgrade to 1.2 has not occurred for official builds yet.

atafra commented 4 years ago

I'm still working on setting up an Arch environment to reproduce the bug, but while doing so I discovered another (probably unrelated) major issue with the Arch package for OIDN 1.2.0: it's corrupted as we've switched to using Git LFS for the large binary files since 1.2.0 but the package is built without using LFS when cloning the repository, so those binary files are just LFS references. The library builds fine this way but at runtime it won't work.

FFY00 commented 4 years ago

Oh, I will look into it. Maybe it is worth adding a check in the build system.

atafra commented 4 years ago

Thanks @FFY00 ! We’ll add a check in the build system in the next release, printing a message that Git LFS is required to correctly clone the repository.

Meanwhile I managed to reproduce the crash, OIDN indeed executes an illegal instruction (pinsrq) on old architectures but not only when executing Blender but also when running the denoise example that ships with OIDN. So the issue is independent from Blender. I tried the OIDN package installed with pacman. I’m currently investigating the cause.

atafra commented 4 years ago

Fixed in https://github.com/OpenImageDenoise/oidn/commit/a0060e2b7055f903ffd3de01adafa2eb75cf68a8 in the devel branch (the crash was caused by a static initializer in mkl-dnn, which is compiled for SSE4.1, and the latest GCC generates some SSE4.1 instructions unlike other/older compilers). We'll release v1.2.1 with the fix in a couple of days.

Regarding the PKGBUILD: apart from the incorrect clone without Git LFS (maybe pulling the source package instead of using git would be easier?), just a minor note: python is only a build dependency, so it could be moved to makedepends.

svenstaro commented 4 years ago

The SSE issue is fixed and the build in our package is also fixed. Thanks for the hint, @atafra. I think this issue can be closed.

svenstaro commented 4 years ago

Can you make a release containing a0060e2b7055f903ffd3de01adafa2eb75cf68a8?

ghost commented 4 years ago

still having the problem

atafra commented 4 years ago

still having the problem

Could you please provide some details? Did you try the latest source code in the devel branch?

atafra commented 4 years ago

Can you make a release containing a0060e2?

Yes, we've just released v1.2.1 which includes the fix.

svenstaro commented 4 years ago

Can you make a release containing a0060e2?

Yes, we've just released v1.2.1 which includes the fix.

Thanks, we're updated to it and it seems to work splendidly!

svenstaro commented 4 years ago

I believe this can be closed then.