adamrehn / ue4-docker

Windows and Linux containers for Unreal Engine 4
https://docs.adamrehn.com/ue4-docker/
MIT License
782 stars 172 forks source link

Stuck on => [builder 13/20] RUN ./Engine/Build/BatchFiles/RunUAT.sh BuildGraph -target="Make Installed Build Linux" -script=Engine/Build/InstalledEngineBuild.xml -set:HostPlatformOnly=true #347

Open dev-fredericfox opened 8 months ago

dev-fredericfox commented 8 months ago

Output of the ue4-docker info command:

Me@My-MBP ~ % ue4-docker info
/Library/Python/3.9/site-packages/urllib3/__init__.py:34: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020
  warnings.warn(
ue4-docker version:         0.0.111 (latest available version is 0.0.111)
Operating system:           macOS 14.2.1 (Kernel Version 23.2.0)
Docker daemon version:      24.0.7
NVIDIA Docker supported:    No
Maximum image size:         No limit detected
Available disk space:       Unknown (typically means the Docker daemon is running in a Moby VM, e.g. Docker Desktop)
Total system memory:        128 GiB physical, 1 GiB virtual
CPU:                        16 physical, 16 logical (arm)

Additional details:

The RUN ./Engine/Build/BatchFiles/RunUAT.sh BuildGraph -target="Make Installed Build Linux" process just randomly stops at some point. Usually between 200/3994 and 600/3994, without any consistency or specified reason. It just keeps running without ever progressing.

Example 01 Stuck at 234

[+] Building 1070.9s (18/33)                                                                                                                                                                              docker:desktop-linux
 => [internal] load .dockerignore                                                                                                                                                                                         0.0s
 => => transferring context: 53B                                                                                                                                                                                          0.0s
 => [internal] load build definition from Dockerfile                                                                                                                                                                      0.0s
 => => transferring dockerfile: 8.11kB                                                                                                                                                                                    0.0s
 => [internal] load metadata for docker.io/adamrehn/ue4-build-prerequisites:opengl-ubuntu22.04                                                                                                                            0.0s
 => [internal] load metadata for docker.io/adamrehn/ue4-source:wyrdue532v0.0.1-opengl-ubuntu22.04                                                                                                                         0.0s
 => [internal] load build context                                                                                                                                                                                         0.0s
 => => transferring context: 307B                                                                                                                                                                                         0.0s
 => CACHED [stage-1 1/8] FROM docker.io/adamrehn/ue4-build-prerequisites:opengl-ubuntu22.04                                                                                                                               0.0s
 => [builder  1/20] FROM docker.io/adamrehn/ue4-source:wyrdue532v0.0.1-opengl-ubuntu22.04                                                                                                                                 0.0s
 => CACHED [builder  2/20] COPY set-changelist.py /tmp/set-changelist.py                                                                                                                                                  0.0s
 => CACHED [builder  3/20] RUN python3 /tmp/set-changelist.py /home/ue4/UnrealEngine/Engine/Build/Build.version $CHANGELIST && echo '' && echo 'RUN directive complete. Docker will now commit the filesystem layer to d  0.0s
 => CACHED [builder  4/20] RUN rm -rf /home/ue4/UnrealEngine/.git && echo '' && echo 'RUN directive complete. Docker will now commit the filesystem layer to disk.' && echo 'Note that for large filesystem layers this   0.0s
 => CACHED [builder  5/20] COPY enable-opengl.py /tmp/enable-opengl.py                                                                                                                                                    0.0s
 => CACHED [builder  6/20] RUN python3 /tmp/enable-opengl.py /home/ue4/UnrealEngine/Engine/Config/BaseEngine.ini && echo '' && echo 'RUN directive complete. Docker will now commit the filesystem layer to disk.' && ec  0.0s
 => CACHED [builder  7/20] COPY patch-filters-xml.py /tmp/patch-filters-xml.py                                                                                                                                            0.0s
 => CACHED [builder  8/20] RUN python3 /tmp/patch-filters-xml.py /home/ue4/UnrealEngine/Engine/Build/InstalledEngineFilters.xml && echo '' && echo 'RUN directive complete. Docker will now commit the filesystem layer   0.0s
 => CACHED [builder  9/20] COPY patch-build-graph.py /tmp/patch-build-graph.py                                                                                                                                            0.0s
 => CACHED [builder 10/20] RUN python3 /tmp/patch-build-graph.py /home/ue4/UnrealEngine/Engine/Build/InstalledEngineBuild.xml /home/ue4/UnrealEngine/Engine/Build/Build.version && echo '' && echo 'RUN directive comple  0.0s
 => CACHED [builder 11/20] RUN ./Engine/Build/BatchFiles/Linux/Build.sh ShaderCompileWorker Linux Development -SkipBuild -buildubt && echo '' && echo 'RUN directive complete. Docker will now commit the filesystem lay  0.0s
 => CACHED [builder 12/20] WORKDIR /home/ue4/UnrealEngine                                                                                                                                                                 0.0s
 => [builder 13/20] RUN ./Engine/Build/BatchFiles/RunUAT.sh BuildGraph     -target="Make Installed Build Linux"     -script=Engine/Build/InstalledEngineBuild.xml     -set:HostPlatformOnly=true     -set:WithDDC=tru  1070.9s
 => => # [229/3994] Compile Module.Chaos.3.cpp                                                                                                                                                                                
 => => # [230/3994] Link (lld) libUnrealEditor-TextureBuildUtilities.so                                                                                                                                                       
 => => # [231/3994] Compile Module.Chaos.10.cpp                                                                                                                                                                               
 => => # [232/3994] Compile Module.AppFramework.3.cpp                                                                                                                                                                         
 => => # [233/3994] Compile Module.OpenColorIOWrapper.cpp                                                                                                                                                                     
 => => # [234/3994] Link (lld) libUnrealEditor-OpenColorIOWrapper.so                    

Example 02 Stuck at 476:

[+] Building 1592.6s (18/33)                                                                                                                                                                              docker:desktop-linux
 => [internal] load build definition from Dockerfile                                                                                                                                                                      0.0s
 => => transferring dockerfile: 8.11kB                                                                                                                                                                                    0.0s
 => [internal] load .dockerignore                                                                                                                                                                                         0.0s
 => => transferring context: 53B                                                                                                                                                                                          0.0s
 => [internal] load metadata for docker.io/adamrehn/ue4-build-prerequisites:opengl-ubuntu22.04                                                                                                                            0.0s
 => [internal] load metadata for docker.io/adamrehn/ue4-source:wyrdue532v0.0.1-opengl-ubuntu22.04                                                                                                                         0.0s
 => [internal] load build context                                                                                                                                                                                         0.0s
 => => transferring context: 307B                                                                                                                                                                                         0.0s
 => CACHED [stage-1 1/8] FROM docker.io/adamrehn/ue4-build-prerequisites:opengl-ubuntu22.04                                                                                                                               0.0s
 => [builder  1/20] FROM docker.io/adamrehn/ue4-source:wyrdue532v0.0.1-opengl-ubuntu22.04                                                                                                                                 0.0s
 => CACHED [builder  2/20] COPY set-changelist.py /tmp/set-changelist.py                                                                                                                                                  0.0s
 => CACHED [builder  3/20] RUN python3 /tmp/set-changelist.py /home/ue4/UnrealEngine/Engine/Build/Build.version $CHANGELIST && echo '' && echo 'RUN directive complete. Docker will now commit the filesystem layer to d  0.0s
 => CACHED [builder  4/20] RUN rm -rf /home/ue4/UnrealEngine/.git && echo '' && echo 'RUN directive complete. Docker will now commit the filesystem layer to disk.' && echo 'Note that for large filesystem layers this   0.0s
 => CACHED [builder  5/20] COPY enable-opengl.py /tmp/enable-opengl.py                                                                                                                                                    0.0s
 => CACHED [builder  6/20] RUN python3 /tmp/enable-opengl.py /home/ue4/UnrealEngine/Engine/Config/BaseEngine.ini && echo '' && echo 'RUN directive complete. Docker will now commit the filesystem layer to disk.' && ec  0.0s
 => CACHED [builder  7/20] COPY patch-filters-xml.py /tmp/patch-filters-xml.py                                                                                                                                            0.0s
 => CACHED [builder  8/20] RUN python3 /tmp/patch-filters-xml.py /home/ue4/UnrealEngine/Engine/Build/InstalledEngineFilters.xml && echo '' && echo 'RUN directive complete. Docker will now commit the filesystem layer   0.0s
 => CACHED [builder  9/20] COPY patch-build-graph.py /tmp/patch-build-graph.py                                                                                                                                            0.0s
 => CACHED [builder 10/20] RUN python3 /tmp/patch-build-graph.py /home/ue4/UnrealEngine/Engine/Build/InstalledEngineBuild.xml /home/ue4/UnrealEngine/Engine/Build/Build.version && echo '' && echo 'RUN directive comple  0.0s
 => CACHED [builder 11/20] RUN ./Engine/Build/BatchFiles/Linux/Build.sh ShaderCompileWorker Linux Development -SkipBuild -buildubt && echo '' && echo 'RUN directive complete. Docker will now commit the filesystem lay  0.0s
 => CACHED [builder 12/20] WORKDIR /home/ue4/UnrealEngine                                                                                                                                                                 0.0s
 => [builder 13/20] RUN ./Engine/Build/BatchFiles/RunUAT.sh BuildGraph     -target="Make Installed Build Linux"     -script=Engine/Build/InstalledEngineBuild.xml     -set:HostPlatformOnly=true     -set:WithDDC=tru  1592.6s
 => => # [471/3994] Compile Module.Engine.18.cpp                                                                                                                                                                              
 => => # [472/3994] Compile Module.Engine.59.cpp                                                                                                                                                                              
 => => # [473/3994] Compile Module.Engine.12.cpp                                                                                                                                                                              
 => => # [474/3994] Compile Module.Engine.15.cpp                                                                                                                                                                              
 => => # [475/3994] Compile Module.Engine.20.cpp                                                                                                                                                                              
 => => # [476/3994] Compile Module.Engine.65.cpp   
slonopotamus commented 8 months ago

How much RAM/CPUs is allocated to Docker VM?

dev-fredericfox commented 8 months ago

How much RAM/CPUs is allocated to Docker VM?

50gb/16 CPUs/880gb disk

dev-fredericfox commented 8 months ago

One of the issues seems to be related to Multithreading, although this is really not my area of expertise. When I reduce my Docker CPUs to 1 I don't get stuck in the compiling phase. (This however take several days). However now the output is clipped, so I am not sure how to proceed to keep debugging.

Output when running only one 1 CPU:

 => [builder 13/20] RUN ./Engine/Build/BatchFiles/RunUAT.sh BuildGraph     -target="Make Installed Build Linux"     -script=Engine/Build/InstalledEngin  235804.3s
 => => # LogShaderCompilers: Display:                         TBasePassPSFNoLightMapPolicySkylight - 5.17% of total time (compiled   38 times, average 77.80 sec, 
 => => # max 236.41 sec, min 44.25 sec)                                                                                                                           
 => => # LogShaderCompilers: Display: TBasePassPSFPrecomputedVolumetricLightmapLightingPolicySkylight - 4.47% of total time (compiled   32 times, average 79.90 se
 => => # c, max 239.03 sec, min 51.85 sec)                                                                                                                        
 => => # Log                                                                                                                                                      
 => => # [output clipped, log limit 2MiB reached]                                                                                                                 
slonopotamus commented 8 months ago

Given that you have plenty of RAM, it might possibly be easier to spin up a Linux VM and run ue4-docker inside it.

TBBle commented 8 months ago

When running only one CPU (which is also the default when doing the build in Hyper-V isolation on Windows) you may be bitten by an issue in the UE build system where the build management system keeps a whole CPU core busy checking for progress from the shader compiler processes, and hence the shader compiler processes themselves actually get little-to-no CPU time and don't progress, leading to an unexpected 10's-of-hours build.

I only found this in UE4, I sent them a bug report, but I don't recall them accepting my fix (a sleep in the loop checking for Shader Compiler progress) or otherwise addressing it; a single-core development environment is not supported after all. In a multi-core environment, I couldn't demonstrate build-time improvement from my fix either, which surprised me.

But that sounds like what you hit here in your single-CPU attempt. So maybe try with two cores, see if that avoids the compile hang and also the shader compiler issue.

dev-fredericfox commented 8 months ago

I see. I tested with 2 CPU cores and sadly it gets stuck fairly early in the process.

Sometimes when I cancel the process after being stuck I notice this error message (not always). Could be related?

Screenshot 2024-01-17 at 11 52 29
TBBle commented 8 months ago

That error is Python seeing a Control-C in a thread, presumably because you're hitting Control-C to cancel the process, I don't believe it's related.

dev-fredericfox commented 8 months ago

@slonopotamus

Given that you have plenty of RAM, it might possibly be easier to spin up a Linux VM and run ue4-docker inside it.

Trying that right now, but first tests show that even in a UTM VM (Debian 12 Rosetta Virtualization) it still gets stuck. I will try an emulation later, but the performance is going to be horrendous.

Screenshot 2024-01-24 at 00 39 12

dev-fredericfox commented 8 months ago

Emulation seems to be broken as well. Or at least it's unreasonable to use. Been "stuck" without progress on step [builder 11/20] for the past 8 hours, fans on full blast. Are people running this package primarily on windows or why does it seem to only affect me?

slonopotamus commented 8 months ago

We either run natively on Windows or on Linux.

dev-fredericfox commented 8 months ago

We either run natively on Windows or on Linux.

But only amd64 or does it work on linux arm?

dev-fredericfox commented 8 months ago

I think I found a "sort of" workaround for now.

When step 13 fails, I run docker -it and run the the steps of 13 manually from inside the container. When the compiling freezes I kill the tasks, and since make is incremental, I just relaunch it. Looks to be working for now.

Only question is: How do I commit this stage manually to the layer to proceed to step 14? I could do a docker commit but AFAIK this creates a new container, how will the ue4-docker script know to look for the container with the manually committed changes? Any input appreciated!

TBBle commented 8 months ago

The only way you could use docker commit and then continue the image build from there would be to change the Dockerfile to have a FROM for that created container at that point. It seems like a lot of hassle.

Can you use docker exec to inspect the hung container build stage with top or similar? (I honestly don't remember if you can do that...) I kind-of suspect this is an Unreal-level bug, some kind of shared resource or busy-wait that's deadlocking. If your CPU load is causing your fans to run, then it thinks its doing something and as I mentioned earlier, I know of at least one busy-wait that used to exist in the system, and may still do.

(Actually, you can use top from outside the container to inspect the processes inside it, but I believe the defaults hid processes in different PID namespaces...)

Oh, right, you can reproduce this in a docker run, so you can definitely docker exec in and use top to inspect that state.

My guess is that you've got all your cores busy-waiting, and no actual build processes are advancing. The fact that two-cores get stuck early suggests that this is the case, that the build is accumulating more busy-waiters over time, until they luck-out and fill all the available cores simultaneously. If that turns out to be the case, it may be possible to renice the busy-waiters from outside the container, in order to get the build to resume progress. That'll be a little fiddly, but less-so than trying to inject the manually-built container into the build workflow.