KhronosGroup / DockerContainers

Docker container specifications which package dependencies for building Khronos documentation and software
Apache License 2.0
7 stars 10 forks source link

Support direct command invocation in vulkan-docs #7

Closed nvpbrown closed 4 years ago

nvpbrown commented 4 years ago

I just started using the Khronos Docker image to successfully build the Vulkan spec HTML, and happy to see that the HTML loads much faster in my default browser (Firefox).

One thing I want to be able to do is directly trigger a build inside the Docker container without using an interactive shell. I want to do that so I can update my already-existing "build anything" script to trigger a build directly from a shell on my Windows development system. When I use the generic Ubuntu Docker image, I can do this successfully. But this doesn't work in the Khronos vulkan-docs image:

C:\Windows\system32>docker run --rm --name test ubuntu /bin/ls -C /
bin   dev  home  lib64  mnt  proc  run   srv  tmp  var
boot  etc  lib   media  opt  root  sbin  sys  usr

C:\Windows\system32>docker run --rm --name test khronosgroup/docker-images:vulkan-docs /bin/ls -C /
** Creating user vulkan id 1000
HOME=/home/vulkan USER=vulkan CONTAINER_CWD=
** ignoring entrypoint.vulkan.sh args - length was 13
** About to gosu vulkan /bin/bash

It looks like the problem might be related to this override in /root/entrypoint.vulkan.sh that seems to trigger an interactive Bash shell no matter what you pass on the command line:

# Default to 'bash' if no arguments are provided
args="$@"
if [ -z "$args" ]; then
    args=/bin/bash
else
    # Actually, always use it, because CI appears to be passing in some horrid bash script as the arguments
    echo -n "** ignoring entrypoint.vulkan.sh args - length was "
    echo $args | wc -c
    args=/bin/bash
fi

Thanks, Pat

oddhack commented 4 years ago

I stole entrypoint.vulkan.sh from something @rpavlik did for the OpenXR containers. I tweaked it a bit because it was getting various errors when I tried starting the image with various combinations of username / UID / groupname / GID some of which already existed in the Docker environment. The goal was simply to be able to run the image locally and map it to your own UID/GID, so that when using a bind mount to get to a repo in the host filesystem, it didn't end up littered with files owned by root afterwards (:-1: to finding that Docker defaults to running as root in this circumstance, and that the Debian installation lets it write files owned by root - this is apparently considered a sane and safe default for a random user of Docker!).

For reasons beyond my feeble comprehension, after I got that much working, the script started complaining vociferously about trying to invoke the code passed in the arguments, which appeared to be /bin/bash followed by about 600 characters of indecipherable scripts, when coming through gitlab CI. Since it was taking 10 minutes turnaround time every time I made the most minor of tweaks to entrypoint in order to rebuild the image, push it to dockerhub, and restart a CI job to see what happened with the change, I decided for the immediate purpose it would do to just always run bash. And this works, to the extent that I have CI running in the internal vulkan repo using this image now, with a 3x speedup in CI runs.

But in addition to the problem you have run into, @rpavlik tells me that there's a general issue with entrypoint.sh when run in Azure CI, which I want to do to get CI on the github repo up and running. So while I am open to someone who understands WTF is actually happening fixing the entrypoint script, I think the simplest path forward is to do the same thing that Ryan did for OpenXR - build two versions of the Vulkan image, one with no entrypoint script and one with the current one. You could hopefully use the no-entrypoint version to do the thing you want to do. I don't have any other suggestions at present because I simply have no comprehension of what's happening to make Ryan's version of the entrypoint script bomb out in vulkan CI, and it took so long to get it functioning that I do not want to dive back into trying to debug it, with a latency closer to an IBM 370 batch job(*) than interactive.

(*) Yes, I actually spent one summer working in an IBM 370 shop. Thankfully I was the only person in the company allowed to use APL instead of COBOL or ASM, since I was merely an intern. This came in handy when they had a problem that needed a three-dimensional array, something apparent extremely difficult to represent it whatever version of COBOL they were using.

nvpbrown commented 4 years ago

Thanks for the info -- the CI stuff definitely sounds messy.

I now have functional parity with what I had on Windows previously. I typically try to use native tools, but decided that was too messy for Vulkan spec builds and installed Cygwin. I used a dedicated shell for building Vulkan content there, and just added a script launching bash in the Docker container as its own dedicated shell. While it would be cool to launch a build directly from my main shell, the separate shell is good enough for my purposes.

rpavlik commented 4 years ago

So at least for OpenXR, using Windows Subsystem for Linux has been a great technique. It is an extreme pain that the turnaround on this stuff is close to system/370 - docker is a pain at times.

My splitting of the entrypoint script into a derived image isn't very complicated: it's mostly just renaming the main image as "base" then making an almost-empty dockerfile for the interactive entry point. for example:

I could probably do that split for you if you want, @oddhack

I similarly have little interest in figuring out why the entrypoint breaks for Vulkan but works fine for OpenXR ;)

oddhack commented 4 years ago

So at least for OpenXR, using Windows Subsystem for Linux has been a great technique. It is an extreme pain that the turnaround on this stuff is close to system/370 - docker is a pain at times.

My splitting of the entrypoint script into a derived image isn't very complicated: it's mostly just renaming the main image as "base" then making an almost-empty dockerfile for the interactive entry point. for example:

I could probably do that split for you if you want, @oddhack

Thanks - I went ahead in #8 but feedback welcome if I did something silly.

@nvpbrown AFAICT this will allow you to do what you want by invoking the vulkan-docs-base image with your script, instead of the vulkan-docs image. However, you won't be able to try it out until either I've accepted this MR and pushed updated images to dockerhub, or you build the images locally. Hoping I can merge this on Friday as it's a simple refactoring.

oddhack commented 4 years ago

BTW I second @rpavlik's suggestion of using WSL. Don't see any use for Cygwin unless you're stuck on an old version of Windows. Much though I am surprised to say it, Microsoft has been doing some good stuff.

nvpbrown commented 4 years ago

This sounds good. I will try this out when an image is available.

I agree with the WSL recommendation in general terms. I actually was using WSL with an Ubuntu image successfully to build Vulkan materials a while back. Unfortunately, I had to disable WSL on my primary development system because I kept getting sporadic BSODs triggered by some sort of filesystem redirection driver that WSL uses. It even happened when I wasn't actively working in the WSL environment.

I'm hoping that the upcoming "WSL2" will work better for me.

oddhack commented 4 years ago

@nvpbrown vulkan-docs-base image is on dockerhub now.

nvpbrown commented 4 years ago

Thanks! I verified that this is behaving as expected. Closing.

In case it ends up being helpful to anyone else, I replaced relevant calls to system() in my Perl script with a call to the following function extracted from the script that I wrote to launch an interactive shell.

sub vkdocker {
    my ($dir, @cmd) = @_;
    my @docker = ( "docker", "run", "-it", "--rm");
    push @docker, ("-v", "$dir:/vulkan");
    push @docker, ("-w", "/vulkan");
    push @docker, ("khronosgroup/docker-images:vulkan-docs-base", @cmd);
    system(@docker);
}

YMMV. On Windows, I don't appear to have a reason to override the user name or ID.