Open PearCoding opened 2 years ago
We recently merged a large refactoring of Thorin, could be related. When is the last time you updated your AnyDSL install before ? I will be away for the weekend sadly, but I will make sure this is solved asap when I am back.
The only thing I quite remember is that there was around 80 new commits in thorin. That large number was the reason I updated it :D
So, I further tested standard compilation (no JIT) and no errors seem to be triggered. It very likely is a JIT only problem. Maybe the jit.cpp in AnyDSL/runtime has to be adapted to the recent changes?
All the errors can be reproduces (for AVX2) with the following code: v0_missShader_full.txt
The code is artic source code and still quite large. I try to minimize it.
I used the following artic line to compile it:
~/dev/anydsl/artic/build/bin/artic ~/dev/anydsl/runtime/platforms/artic/intrinsics*.impala ~/dev/anydsl/runtime/platforms/artic/runtime.impala v0_missShader_full.art --emit-llvm --log-level info -Wall
Adapt it to your needs.
Knowing it can be produced without JIT means its not a JIT, but a weird code error.
More information:
Found a single line which triggers the error messages (the artifacts still remain).
if mat.bsdf.is_specular || num_lights == 0 {
return(ShadowRay::None)
}
The line is from Ignis https://github.com/PearCoding/Ignis/blob/4a0f08e4bcc62a0a8b6fa2cfda27eb40c847ae37/src/artic/impl/technique/pathtracer.art#L73
If num_lights
is removed, the compilation process is fine, only the artifacts remain and nvvm still crashes.
A very important factor is that num_lights
is always a compile time constant. The function make_path_renderer
is called with a literal value for num_lights
, in contrary to max_path_len
which is a runtime variable.
Looking at the artifacts and nvvm I fear this is more than a single bug. :/
These are the last log entries which definitely worked:
Thorin:
commit 719833f675dc98452d432444bba0b0bb6ff3e30e (HEAD -> master, origin/master, origin/HEAD)
Date: Fri Feb 18 20:22:33 2022 +0100
Whitespace, formatting.
Artic:
commit 36e8f452583d50a20256ee021282bad6188a5c76 (HEAD -> master, origin/master, origin/HEAD)
Date: Tue Jan 25 19:29:40 2022 +0100
fix doc for ExprStmt
Runtime:
Merge: 4822907 9f91939
Date: Thu Mar 31 00:08:13 2022 +0200
Merge pull request #30 from AnyDSL/add-cuda-cpp-version-option
Add AnyDSL_runtime_CUDA_CXX_STANDARD option
Yet another quick update. The artifacts with AVX2 are an issue with Ignis not thorin (most likely) as they still occur within the very old commits, but the error messages do not.
The following file contains a mini-dump on Windows, with a crash due to a stack overflow. Starting at the line https://github.com/AnyDSL/thorin/blob/6fa217b6709120a8d7ecfd51de4e4f8358180e1d/src/thorin/transform/importer.cpp#L81 and (after a recursion depth of > 64) crashing inside https://github.com/AnyDSL/thorin/blob/6fa217b6709120a8d7ecfd51de4e4f8358180e1d/src/thorin/transform/importer.cpp#L86 exactly at https://github.com/AnyDSL/thorin/blob/6fa217b6709120a8d7ecfd51de4e4f8358180e1d/src/thorin/world.cpp#L104 The information might become handy or helpful for debugging. igview.zip
Fixed with daa65233460e4ae8dd89b9849c071dd18b2dd043
The following file contains a mini-dump on Windows, with a crash due to a stack overflow. Starting at the line
and (after a recursion depth of > 64) crashing inside
The information might become handy or helpful for debugging. igview.zip
This bug still exists. Therefore, I am reopening the issue
JIT code generation triggers a lot of errors and produces (for some targets) artifacts while rendering, some targets (nvvm) do not even run. The bug appeared after updating AnyDSL (from a version before March 2022), not by modifying Ignis.
You can repeat the bug with any scene in Ignis when using recent AnyDSL and whatever target you want. The generic, avx2 and nvvm all fail and produce a similar error output. Generic does not produce artifacts and renders totally fine. AVX2 produces artifacts as shown in the image. NVVM produces a
CUDA_ERROR_LAUNCH_FAILED
and crashes. The problem has to be in thorin (or artic). Tried a previous version of rv (which definitely worked before) and the bug still exists.Here are some errors messages triggered by thorin for avx2: errors.txt and the (large!) dump produced by thorin (or artic): debug.txt The debug.txt contains multiple jit shaders and some Ignis log output as well, as it was just grabbed from the stdout.
I will have to see how to single out the problem, but that will take some time. I will also look which commit broke the code generation. Nevertheless, I will post the issue now. Maybe you guys already know where the problem is?
The bug happened on Linux, Ubuntu 22.04. Did not try Windows yet.