AnyDSL / thorin

The Higher-Order Intermediate Representation
https://anydsl.github.io
GNU Lesser General Public License v3.0
151 stars 15 forks source link

Important: Recent changes break code generation #124

Open PearCoding opened 2 years ago

PearCoding commented 2 years ago

JIT code generation triggers a lot of errors and produces (for some targets) artifacts while rendering, some targets (nvvm) do not even run. The bug appeared after updating AnyDSL (from a version before March 2022), not by modifying Ignis.

AVX2 artifacts

You can repeat the bug with any scene in Ignis when using recent AnyDSL and whatever target you want. The generic, avx2 and nvvm all fail and produce a similar error output. Generic does not produce artifacts and renders totally fine. AVX2 produces artifacts as shown in the image. NVVM produces a CUDA_ERROR_LAUNCH_FAILED and crashes. The problem has to be in thorin (or artic). Tried a previous version of rv (which definitely worked before) and the bug still exists.

Here are some errors messages triggered by thorin for avx2: errors.txt and the (large!) dump produced by thorin (or artic): debug.txt The debug.txt contains multiple jit shaders and some Ignis log output as well, as it was just grabbed from the stdout.

I will have to see how to single out the problem, but that will take some time. I will also look which commit broke the code generation. Nevertheless, I will post the issue now. Maybe you guys already know where the problem is?

The bug happened on Linux, Ubuntu 22.04. Did not try Windows yet.

Hugobros3 commented 2 years ago

We recently merged a large refactoring of Thorin, could be related. When is the last time you updated your AnyDSL install before ? I will be away for the weekend sadly, but I will make sure this is solved asap when I am back.

PearCoding commented 2 years ago

The only thing I quite remember is that there was around 80 new commits in thorin. That large number was the reason I updated it :D

PearCoding commented 2 years ago

So, I further tested standard compilation (no JIT) and no errors seem to be triggered. It very likely is a JIT only problem. Maybe the jit.cpp in AnyDSL/runtime has to be adapted to the recent changes?

PearCoding commented 2 years ago

All the errors can be reproduces (for AVX2) with the following code: v0_missShader_full.txt

The code is artic source code and still quite large. I try to minimize it.

I used the following artic line to compile it:

~/dev/anydsl/artic/build/bin/artic ~/dev/anydsl/runtime/platforms/artic/intrinsics*.impala ~/dev/anydsl/runtime/platforms/artic/runtime.impala v0_missShader_full.art --emit-llvm --log-level info -Wall

Adapt it to your needs.

Knowing it can be produced without JIT means its not a JIT, but a weird code error.

PearCoding commented 2 years ago

More information:

Found a single line which triggers the error messages (the artifacts still remain).

if mat.bsdf.is_specular || num_lights == 0 {
   return(ShadowRay::None)
}

The line is from Ignis https://github.com/PearCoding/Ignis/blob/4a0f08e4bcc62a0a8b6fa2cfda27eb40c847ae37/src/artic/impl/technique/pathtracer.art#L73 If num_lights is removed, the compilation process is fine, only the artifacts remain and nvvm still crashes. A very important factor is that num_lights is always a compile time constant. The function make_path_renderer is called with a literal value for num_lights, in contrary to max_path_len which is a runtime variable.

Looking at the artifacts and nvvm I fear this is more than a single bug. :/

PearCoding commented 2 years ago

These are the last log entries which definitely worked:

Thorin:

commit 719833f675dc98452d432444bba0b0bb6ff3e30e (HEAD -> master, origin/master, origin/HEAD)
Date:   Fri Feb 18 20:22:33 2022 +0100
    Whitespace, formatting.

Artic:

commit 36e8f452583d50a20256ee021282bad6188a5c76 (HEAD -> master, origin/master, origin/HEAD)
Date:   Tue Jan 25 19:29:40 2022 +0100
    fix doc for ExprStmt

Runtime:

Merge: 4822907 9f91939
Date:   Thu Mar 31 00:08:13 2022 +0200
    Merge pull request #30 from AnyDSL/add-cuda-cpp-version-option
    Add AnyDSL_runtime_CUDA_CXX_STANDARD option
PearCoding commented 2 years ago

Yet another quick update. The artifacts with AVX2 are an issue with Ignis not thorin (most likely) as they still occur within the very old commits, but the error messages do not.

PearCoding commented 2 years ago

The following file contains a mini-dump on Windows, with a crash due to a stack overflow. Starting at the line https://github.com/AnyDSL/thorin/blob/6fa217b6709120a8d7ecfd51de4e4f8358180e1d/src/thorin/transform/importer.cpp#L81 and (after a recursion depth of > 64) crashing inside https://github.com/AnyDSL/thorin/blob/6fa217b6709120a8d7ecfd51de4e4f8358180e1d/src/thorin/transform/importer.cpp#L86 exactly at https://github.com/AnyDSL/thorin/blob/6fa217b6709120a8d7ecfd51de4e4f8358180e1d/src/thorin/world.cpp#L104 The information might become handy or helpful for debugging. igview.zip

PearCoding commented 1 year ago

Fixed with daa65233460e4ae8dd89b9849c071dd18b2dd043

PearCoding commented 1 year ago

The following file contains a mini-dump on Windows, with a crash due to a stack overflow. Starting at the line

https://github.com/AnyDSL/thorin/blob/6fa217b6709120a8d7ecfd51de4e4f8358180e1d/src/thorin/transform/importer.cpp#L81

and (after a recursion depth of > 64) crashing inside

https://github.com/AnyDSL/thorin/blob/6fa217b6709120a8d7ecfd51de4e4f8358180e1d/src/thorin/transform/importer.cpp#L86 exactly at

https://github.com/AnyDSL/thorin/blob/6fa217b6709120a8d7ecfd51de4e4f8358180e1d/src/thorin/world.cpp#L104

The information might become handy or helpful for debugging. igview.zip

This bug still exists. Therefore, I am reopening the issue