aws / ec2-macos-init

EC2 macOS Init is the launch daemon used to initialize Mac instances within EC2.
https://aws.amazon.com/ec2/instance-types/mac/
Apache License 2.0
148 stars 19 forks source link

Service wont start on boot #9

Open kayatf opened 2 years ago

kayatf commented 2 years ago

Dear ec2-macos-init contributors,

I've now tried multiple times (directly through launchd and through ec2-macos-init) to run a simple binary on system boot. This aims to join the machine to an Active Directory, etc... What I managed with this config so far was, that the script got executed correctly - but only a few seconds after a (random) user logged in to the machine - not on boot.

The config looks like this:

 [[Module]]
  Name = "SetupStartup"
  PriorityGroup = 4
  [Module.Command]
    Cmd = ["/usr/local/bin/startup.sh.x"]
    # I've tried this both with root and ec2-user
    RunAsUser = "ec2-user" 

chmod on the binary is a+x

mattcataws commented 2 years ago

Hey @rescaux, I'm sorry to hear that you're having issues with ec2-macos-init.

Would you be able to include any error message that was logged by ec2-macos-init? The log file's location is /var/log/amazon/ec2/ec2-macos-init.log.

Also, that's strange that the script was only executed after a user logged in... I've tried reproducing the problem with the config you posted and my own test script. However, I wasn't able to see the exact behavior that you're describing.

There was an error with the module you included -- it's missing a run type configuration (RunOnce, RunPerBoot, or RunPerInstance). Once I added that, ec2-macos-init was able to run my test script just fine. I'll include my test environment below.

My config looked like this:

[[Module]]
    Name = "SetupStartup"
    PriorityGroup = 4
    RunPerBoot = true
    [Module.Command]
        Cmd = ["/usr/local/bin/startup.sh.x"]
        RunAsUser = "ec2-user"

The contents of /usr/local/bin/startup.sh.x:

#!/bin/bash

echo "Hello World!"

The permissions I have on the script:

-rwxr-xr-x  1 ec2-user  admin  33 Apr 20 19:00 /usr/local/bin/startup.sh.x

The logs from priority level 4 after restarting the instance:

2022/04/20 19:15:26.399460 Processing priority level 4 (2 modules)...
2022/04/20 19:15:26.399472 Running module [SetupStartup] (type: command, group: 4)
2022/04/20 19:15:26.399489 Skipping module [ExecuteUserData] (type: userdata, group: 4) due to Run type setting
2022/04/20 19:15:26.451198 Successfully completed module [SetupStartup] (type: command, group: 4) with message: successfully ran command [[/usr/local/bin/startup.sh.x]] with stdout [Hello World!] and stderr []
2022/04/20 19:15:26.451252 Successfully completed processing of priority level 4

Could you try adding the run type configuration to your module and let me know what happens. Any error logs from that run would also be useful.

kayatf commented 2 years ago

Okay, I've now tried everything so far and it's still behaving weird. As one can see in this video, the script runs only when macos-ec2-init is invoked manually. It is invoked on boot as well, but won't do anything.

jahkeup commented 2 years ago

Hi @rescaux, thanks for sharing the logs~

What I managed with this config so far was, that the script got executed correctly - but only a few seconds after a (random) user logged in to the machine - not on boot.

@mattcataws and I are wondering where in the script that this hangs. For example, does the script hang on the Domain Join, or when setting up the VPN connection?

From the description provided and logs in your video seems, it like the script might be invoking a command that requires an active user login to complete (maybe a prompt, or initialization of some state by the OS).

rj93 commented 2 years ago

Hi @jahkeup, I also have a similar issue. I am using mac2.metal and ec2-macos-init version 1.5.3 [2022-06-09 16:07:51 -0700] and in my ec2-macos-init I have the following:

### Group 6
## Custom services

[[Module]]
  Name = "StartGitlabRunner"
  PriorityGroup = 6
  RunPerBoot = true
  [Module.Command]
    Cmd = ["/Users/ec2-user/.gitlab-runner/run.sh"]
    RunAsUser = "ec2-user"

with my script as follows:

#!/bin/bash

if pgrep -x gitlab-runner >/dev/null
then
  echo "Skipping starting gitlab-runner as process is already running"
else
  echo "Starting gitlab-runner process..."
  /opt/homebrew/bin/gitlab-runner run &> /Users/ec2-user/.gitlab-runner/runner.log &
  echo "Started gitlab-runner process"
fi

echo "process id: $(pgrep -x gitlab-runner)"

The logs from the system restart are:

2022/07/15 13:41:37.774763 Fetching instance ID from IMDS...
2022/07/15 13:41:37.811484 Unable to get instance ID - IMDS may not be available yet...retrying every 1s [0/600]
2022/07/15 13:41:41.822631 Running on instance i-03b4cf7416cbc2031
2022/07/15 13:41:41.822802 Reading init config...
2022/07/15 13:41:41.829855 Successfully read init config
2022/07/15 13:41:41.829921 Validating config...
2022/07/15 13:41:41.830192 Successfully validated config
2022/07/15 13:41:41.830222 Prioritizing modules...
2022/07/15 13:41:41.830257 Successfully prioritized modules
2022/07/15 13:41:41.830281 Creating instance history directories for current instance...
2022/07/15 13:41:41.830420 Successfully created directories
2022/07/15 13:41:41.830449 Getting instance history...
2022/07/15 13:41:41.838421 Successfully gathered instance history
2022/07/15 13:41:41.838481 Processing priority level 1 (2 modules)...
2022/07/15 13:41:41.839368 Running module [UnmountLocalSSD] (type: command, group: 1)
2022/07/15 13:41:41.844034 Running module [DisableEthernet] (type: command, group: 1)
2022/07/15 13:41:43.131293 Successfully completed module [UnmountLocalSSD] (type: command, group: 1) with message: successfully ran command [[/bin/zsh -c diskutil list internal physical | egrep -o '^/dev/disk\d+' | xargs diskutil eject || true]] with stdout [] and stderr [Volume failed to eject]
2022/07/15 13:41:53.761170 Successfully completed module [DisableEthernet] (type: command, group: 1) with message: successfully ran command [[/usr/sbin/networksetup -setnetworkserviceenabled Ethernet off]] with stdout [] and stderr []
2022/07/15 13:41:53.761301 Successfully completed processing of priority level 1
2022/07/15 13:41:53.761326 Processing priority level 2 (1 modules)...
2022/07/15 13:41:53.761374 Running module [CheckNetworkIsUp] (type: networkcheck, group: 2)
2022/07/15 13:41:53.772429 Successfully completed module [CheckNetworkIsUp] (type: networkcheck, group: 2) with message: successfully pinged default gateway with a RTT of 368µs
2022/07/15 13:41:53.772520 Successfully completed processing of priority level 2
2022/07/15 13:41:53.772545 Processing priority level 3 (11 modules)...
2022/07/15 13:41:53.772609 Running module [DisableWiFi] (type: command, group: 3)
2022/07/15 13:41:53.772628 Running module [EC2SuggestedDefaultConfigPerformance] (type: systemconfig, group: 3)
2022/07/15 13:41:53.772680 Skipping module [ManageEC2User] (type: usermanagement, group: 3) due to Run type setting
2022/07/15 13:41:53.772720 Running module [EC2SuggestedDefaultConfigSecurity] (type: systemconfig, group: 3)
2022/07/15 13:41:53.772810 Skipping module [SetDefaultTimezone] (type: command, group: 3) due to Run type setting
2022/07/15 13:41:53.773784 Skipping module [DisableSleep] (type: command, group: 3) due to Run type setting
2022/07/15 13:41:53.773854 Skipping module [NeverSleep] (type: command, group: 3) due to Run type setting
2022/07/15 13:41:53.773885 Skipping module [NeverSleepDisplay] (type: command, group: 3) due to Run type setting
2022/07/15 13:41:53.774044 Running module [UpdateMOTD] (type: motd, group: 3)
2022/07/15 13:41:53.774129 Skipping module [RemoveSSHGroup] (type: command, group: 3) due to Run type setting
2022/07/15 13:41:53.772618 Skipping module [SetAmazonTimeSync] (type: command, group: 3) due to Run type setting
2022/07/15 13:41:53.774703 Did not modify SSHD configuration
2022/07/15 13:41:53.787156 Successfully completed module [EC2SuggestedDefaultConfigSecurity] (type: systemconfig, group: 3) with message: system configuration completed with [0 changed / 1 unchanged / 0 error(s)] out of 1 requested changes
2022/07/15 13:41:53.800800 Modified sysctl property [net.inet.tcp.autosndbufmax=33554432]
2022/07/15 13:41:53.802630 Modified sysctl property [net.inet.tcp.sendspace=1048576]
2022/07/15 13:41:53.802706 Modified sysctl property [net.link.generic.system.rcvq_maxlen=1024]
2022/07/15 13:41:53.803838 Modified sysctl property [kern.aiothreads=64]
2022/07/15 13:41:53.803945 Modified sysctl property [net.inet.tcp.win_scale_factor=8]
2022/07/15 13:41:53.804990 Did not modify default [AutomaticCheckEnabled]
2022/07/15 13:41:53.805009 Did not modify default [AutomaticDownload]
2022/07/15 13:41:53.805111 Did not modify default [AutomaticallyInstallMacOSUpdates]
2022/07/15 13:41:53.805242 Modified sysctl property [net.inet.tcp.autorcvbufmax=33554432]
2022/07/15 13:41:53.805271 Modified sysctl property [kern.aiomax=900]
2022/07/15 13:41:53.805493 Did not modify default [CriticalUpdateInstall]
2022/07/15 13:41:53.805659 Did not modify default [ConfigDataInstall]
2022/07/15 13:41:53.806199 Modified sysctl property [net.inet.tcp.recvspace=1048576]
2022/07/15 13:41:53.814463 Successfully completed module [UpdateMOTD] (type: motd, group: 3) with message: successfully updated motd file [/etc/motd] with version string [macOS Monterey 12.4]
2022/07/15 13:41:53.837840 Error while running module [DisableWiFi] (type: command, group: 3) with message:  and err: ec2macosinit: error executing command [[/bin/zsh -c wifidevice="$(networksetup -listallhardwareports | grep -A 1 "Wi-Fi" | tail -n 1 | cut -d " " -f2)"; if [[ ! -z $wifidevice ]]; then networksetup -setairportpower $wifidevice off; fi]] with stdout [Wireless is not available.
** Error: Wireless is not currently available on this machine.] and stderr []: exit status 9
2022/07/15 13:41:53.905767 Modified sysctl property [kern.aioprocmax=256]
2022/07/15 13:41:53.905875 Successfully completed module [EC2SuggestedDefaultConfigPerformance] (type: systemconfig, group: 3) with message: system configuration completed with [9 changed / 5 unchanged / 0 error(s)] out of 14 requested changes
2022/07/15 13:41:53.905917 Successfully completed processing of priority level 3
2022/07/15 13:41:53.905935 Processing priority level 4 (1 modules)...
2022/07/15 13:41:53.905968 Skipping module [GetSSHKeys] (type: sshkeys, group: 4) due to Run type setting
2022/07/15 13:41:53.905986 Successfully completed processing of priority level 4
2022/07/15 13:41:53.906000 Processing priority level 5 (1 modules)...
2022/07/15 13:41:53.906021 Skipping module [ExecuteUserData] (type: userdata, group: 5) due to Run type setting
2022/07/15 13:41:53.906036 Successfully completed processing of priority level 5
2022/07/15 13:41:53.906046 Processing priority level 6 (1 modules)...
2022/07/15 13:41:53.906067 Running module [StartGitlabRunner] (type: command, group: 6)
2022/07/15 13:41:53.963004 Successfully completed module [StartGitlabRunner] (type: command, group: 6) with message: successfully ran command [[/Users/ec2-user/.gitlab-runner/run.sh]] with stdout [Starting gitlab-runner process...
Started gitlab-runner process
process id: 433] and stderr []
2022/07/15 13:41:53.963070 Successfully completed processing of priority level 6
2022/07/15 13:41:53.963082 Writing instance history for instance i-03b4cf7416cbc2031...
2022/07/15 13:41:53.969077 Successfully wrote instance history
2022/07/15 13:41:53.969110 EC2 macOS Init completed in 12.151719875s

Which suggests that the server is actually starting, but then exits.

I've tried adding a sleep 600 to the end of the run.sh file and when SSHing onto the instance I can see the process running:

ec2-user@ip-10-211-190-39 ~ % ps -ef | grep gitlab
  501   428   109   0  1:55pm ??         0:00.00 /bin/bash /Users/ec2-user/.gitlab-runner/run.sh
  501   431   428   0  1:55pm ??         0:00.31 /opt/homebrew/bin/gitlab-runner run
  501   503   487   0  1:56pm ttys000    0:00.00 grep gitlab

Waiting for the sleep 600 to finish and the the process is killed:

ec2-user@ip-10-211-190-39 ~ % ps -ef | grep gitlab
  501   656   487   0  2:06pm ttys000    0:00.00 grep gitlab

logs:

 2022/07/15 13:55:41.302136 Running module [StartGitlabRunner] (type: command, group: 6)
2022/07/15 14:05:41.601508 Successfully completed module [StartGitlabRunner] (type: command, group: 6) with message: successfully ran command [[/Users/ec2-user/.gitlab-runner/run.sh]] with stdout [Starting gitlab-runner process...
Started gitlab-runner process
process id: 431] and stderr []
2022/07/15 14:05:41.601820 Successfully completed processing of priority level 6
2022/07/15 14:05:41.601866 Writing instance history for instance i-03b4cf7416cbc2031...
2022/07/15 14:05:41.608182 Successfully wrote instance history
2022/07/15 14:05:41.608241 EC2 macOS Init completed in 10m14.065216375s
jahkeup commented 2 years ago

Thanks for reporting in and posting logs @rj93!

We suspect you're up against a separate (but related) issue that comes into play when processes spawn in the context of a Launchd Daemon (and/or Agents). Processes created by Launchd Daemons (directly or indirectly) must not make system calls that would spawn a whole other conceptual "daemon process" according to the "Required Behaviors" section of Launchd documentation. It is likely you'll need a separate Launchd Daemon instance that launches and owns the GitLab CI runner process (set up with your flags and environment as needed).

As written, the shared script creates a "background" process (from the gitlab-runner run ... & command), that belongs to the same session as ec2-macos-init and terminates/exits along with the primary Launchd process - ie: ec2-macos-init.

@okudajun and I tried alternatives to spawn from executed scripts and still found that they failed to stick around or failed to run at all (the usual trustworthy nohup had its syscalls fail as inappropriate IOCTLs). My knee-jerk suggestion would have been to try with nohup... but.. doesn't appear to work (as expected based on docs).

Our team will need to look at how ec2-macos-init itself spawns and what it executes from new angles to determine if ec2-macos-init can transparently support or have helpers to support managing ad-hoc daemon processes. As written today, however, and based on the Launchd - Required Behavior documentation, ec2-macos-init cannot spawn daemon processes.


@rescaux - is there a process spawned in startup.sh that might want to daemonize? If so, you might need to spawn a Launchd Daemon (or multiple, dependent on your use case) in order to work around this Launchd limitation.

rj93 commented 2 years ago

Hi @jahkeup, it may be worth me pointing out that this run.sh script is also called from userdata at instance launch and it keeps the process running once the ec2-macos-init process has completed.

Heres a snippet from my userdata:


# install gitlab-runner, etc
# ...

echo "Starting gitlab-runner..."
su - ec2-user -c '/Users/ec2-user/.gitlab-runner/run.sh'
echo "Started gitlab-runner"

# Won't run run as part of this ec2-macos-init process, but if the instance reboots will start the gitlab-runner
EC2_MACOS_INIT_FILE=/usr/local/aws/ec2-macos-init/init.toml
if ! grep -q "StartGitlabRunner" $EC2_MACOS_INIT_FILE
then
  echo "Adding group ec2-macos-init..."
  cat >> $EC2_MACOS_INIT_FILE <<-EOF

    ### Group 6
    ## Custom services

    [[Module]]
      Name = "StartGitlabRunner"
      PriorityGroup = 6
      RunPerBoot = true
      [Module.Command]
        Cmd = ["/Users/ec2-user/.gitlab-runner/run.sh"]
        RunAsUser = "ec2-user"
    EOF
  echo "Added group ec2-macos-init"
else 
  echo "Skipping adding group to ec2-macos-init"
fi

echo "Finished userdata script"

I was hoping to get away without having to use launchd, but thanks for the feedback I will go down that route instead.

Heres the logs from instance launch:

2022/07/19 01:21:51.024900 Fetching instance ID from IMDS...
2022/07/19 01:21:51.026363 Unable to get instance ID - IMDS may not be available yet...retrying every 1s [0/600]
2022/07/19 01:21:57.022153 Running on instance i-07d4615d1be0c53ed
2022/07/19 01:21:57.022320 Reading init config...
2022/07/19 01:21:57.036909 Successfully read init config
2022/07/19 01:21:57.037023 Validating config...
2022/07/19 01:21:57.037379 Successfully validated config
2022/07/19 01:21:57.037418 Prioritizing modules...
2022/07/19 01:21:57.037460 Successfully prioritized modules
2022/07/19 01:21:57.037490 Creating instance history directories for current instance...
2022/07/19 01:21:57.037975 Successfully created directories
2022/07/19 01:21:57.038017 Getting instance history...
2022/07/19 01:21:57.038339 Successfully gathered instance history
2022/07/19 01:21:57.038378 Processing priority level 1 (2 modules)...
2022/07/19 01:21:57.038720 Running module [UnmountLocalSSD] (type: command, group: 1)
2022/07/19 01:21:57.038735 Running module [DisableEthernet] (type: command, group: 1)
2022/07/19 01:21:57.104645 Successfully completed module [DisableEthernet] (type: command, group: 1) with message: successfully ran command [[/usr/sbin/networksetup -setnetworkserviceenabled Ethernet off]] with stdout [] and stderr []
2022/07/19 01:22:10.342104 Successfully completed module [UnmountLocalSSD] (type: command, group: 1) with message: successfully ran command [[/bin/zsh -c diskutil list internal physical | egrep -o '^/dev/disk\d+' | xargs diskutil eject || true]] with stdout [] and stderr [Volume failed to eject]
2022/07/19 01:22:10.342336 Successfully completed processing of priority level 1
2022/07/19 01:22:10.342381 Processing priority level 2 (1 modules)...
2022/07/19 01:22:10.342469 Running module [CheckNetworkIsUp] (type: networkcheck, group: 2)
2022/07/19 01:22:10.354944 Successfully completed module [CheckNetworkIsUp] (type: networkcheck, group: 2) with message: successfully pinged default gateway with a RTT of 490.791µs
2022/07/19 01:22:10.355089 Successfully completed processing of priority level 2
2022/07/19 01:22:10.355122 Processing priority level 3 (11 modules)...
2022/07/19 01:22:10.355200 Running module [EC2SuggestedDefaultConfigPerformance] (type: systemconfig, group: 3)
2022/07/19 01:22:10.355255 Running module [ManageEC2User] (type: usermanagement, group: 3)
2022/07/19 01:22:10.355296 Running module [UpdateMOTD] (type: motd, group: 3)
2022/07/19 01:22:10.355324 Running module [SetDefaultTimezone] (type: command, group: 3)
2022/07/19 01:22:10.356258 Running module [NeverSleep] (type: command, group: 3)
2022/07/19 01:22:10.355201 Running module [DisableWiFi] (type: command, group: 3)
2022/07/19 01:22:10.355215 Running module [EC2SuggestedDefaultConfigSecurity] (type: systemconfig, group: 3)
2022/07/19 01:22:10.356653 Running module [RemoveSSHGroup] (type: command, group: 3)
2022/07/19 01:22:10.355308 Running module [NeverSleepDisplay] (type: command, group: 3)
2022/07/19 01:22:10.356773 Running module [SetAmazonTimeSync] (type: command, group: 3)
2022/07/19 01:22:10.356919 Running module [DisableSleep] (type: command, group: 3)
2022/07/19 01:22:10.367955 Modified SSHD configuration, did not restart SSHD since it was not running
2022/07/19 01:22:10.368129 Successfully completed module [EC2SuggestedDefaultConfigSecurity] (type: systemconfig, group: 3) with message: system configuration completed with [1 changed / 0 unchanged / 0 error(s)] out of 1 requested changes
2022/07/19 01:22:10.449413 Successfully completed module [UpdateMOTD] (type: motd, group: 3) with message: successfully updated motd file [/etc/motd] with version string [macOS Monterey 12.4]
2022/07/19 01:22:10.450584 Modified sysctl property [kern.aiothreads=64]
2022/07/19 01:22:10.454907 Modified sysctl property [net.inet.tcp.win_scale_factor=8]
2022/07/19 01:22:10.458758 Modified sysctl property [net.inet.tcp.recvspace=1048576]
2022/07/19 01:22:10.459068 Modified sysctl property [net.inet.tcp.autosndbufmax=33554432]
2022/07/19 01:22:10.459163 Modified sysctl property [net.link.generic.system.rcvq_maxlen=1024]
2022/07/19 01:22:10.459251 Modified sysctl property [kern.aiomax=900]
2022/07/19 01:22:10.459293 Modified sysctl property [net.inet.tcp.autorcvbufmax=33554432]
2022/07/19 01:22:10.460907 Modified sysctl property [net.inet.tcp.sendspace=1048576]
2022/07/19 01:22:10.461173 Modified sysctl property [kern.aioprocmax=256]
2022/07/19 01:22:10.496966 Error while running module [DisableWiFi] (type: command, group: 3) with message:  and err: ec2macosinit: error executing command [[/bin/zsh -c wifidevice="$(networksetup -listallhardwareports | grep -A 1 "Wi-Fi" | tail -n 1 | cut -d " " -f2)"; if [[ ! -z $wifidevice ]]; then networksetup -setairportpower $wifidevice off; fi]] with stdout [Wireless is not available.
** Error: Wireless is not currently available on this machine.] and stderr []: exit status 9
2022/07/19 01:22:10.511551 Modified default [AutomaticCheckEnabled]
2022/07/19 01:22:10.511851 Modified default [ConfigDataInstall]
2022/07/19 01:22:10.518949 Successfully completed module [DisableSleep] (type: command, group: 3) with message: successfully ran command [[sudo pmset -a disablesleep 1]] with stdout [] and stderr []
2022/07/19 01:22:10.520113 Successfully completed module [NeverSleep] (type: command, group: 3) with message: successfully ran command [[sudo pmset -a sleep 0]] with stdout [] and stderr []
2022/07/19 01:22:10.526567 Successfully completed module [NeverSleepDisplay] (type: command, group: 3) with message: successfully ran command [[sudo pmset -a displaysleep 0]] with stdout [] and stderr []
2022/07/19 01:22:10.540609 Successfully completed module [RemoveSSHGroup] (type: command, group: 3) with message: successfully ran command [[/bin/zsh -c dscl /Local/Default delete /Groups/com.apple.access_ssh || true]] with stdout [] and stderr []
2022/07/19 01:22:10.549067 Modified default [CriticalUpdateInstall]
2022/07/19 01:22:10.549071 Modified default [AutomaticDownload]
2022/07/19 01:22:10.552830 Modified default [AutomaticallyInstallMacOSUpdates]
2022/07/19 01:22:10.552863 Successfully completed module [EC2SuggestedDefaultConfigPerformance] (type: systemconfig, group: 3) with message: system configuration completed with [14 changed / 0 unchanged / 0 error(s)] out of 14 requested changes
2022/07/19 01:22:10.623654 Successfully completed module [SetDefaultTimezone] (type: command, group: 3) with message: successfully ran command [[systemsetup -settimezone GMT]] with stdout [Set TimeZone: GMT] and stderr [2022-07-19 01:22:10.621 systemsetup[413:3903] ### Error:-99 File:/AppleInternal/Library/BuildRoots/b6051351-c030-11ec-96e9-3e7866fcf3a1/Library/Caches/com.apple.xbs/Sources/Admin/InternetServices.m Line:379]
2022/07/19 01:22:10.633880 Successfully completed module [SetAmazonTimeSync] (type: command, group: 3) with message: successfully ran command [[systemsetup -setusingnetworktime on -setnetworktimeserver 169.254.169.123]] with stdout [Network Time is already on.
setNetworkTimeServer: 169.254.169.123] and stderr [2022-07-19 08:22:10.633 systemsetup[428:3902] ### Error:-99 File:/AppleInternal/Library/BuildRoots/b6051351-c030-11ec-96e9-3e7866fcf3a1/Library/Caches/com.apple.xbs/Sources/Admin/InternetServices.m Line:379]
2022/07/19 01:22:11.623567 Successfully completed module [ManageEC2User] (type: usermanagement, group: 3) with message: successfully set secure password for ec2-user
2022/07/19 01:22:11.623616 Successfully completed processing of priority level 3
2022/07/19 01:22:11.623622 Processing priority level 4 (1 modules)...
2022/07/19 01:22:11.623637 Running module [GetSSHKeys] (type: sshkeys, group: 4)
2022/07/19 01:22:11.637203 Successfully completed module [GetSSHKeys] (type: sshkeys, group: 4) with message: successfully added 1 keys to authorized_users
2022/07/19 01:22:11.637225 Successfully completed processing of priority level 4
2022/07/19 01:22:11.637232 Processing priority level 5 (1 modules)...
2022/07/19 01:22:11.637245 Running module [ExecuteUserData] (type: userdata, group: 5)
2022/07/19 01:23:38.353725 Successfully completed module [ExecuteUserData] (type: userdata, group: 5) with message: successfully ran user data with stdout: [Starting userdata script...
Installing gitlab-runner...
==> Downloading https://ghcr.io/v2/homebrew/core/gitlab-runner/manifests/15.1.0
==> Downloading https://ghcr.io/v2/homebrew/core/gitlab-runner/blobs/sha256:29e742eddcdd1692ec3bc8ca847210bf9c7df47fc17f710b0924d412b92e8614
==> Downloading from https://pkg-containers.githubusercontent.com/ghcr1/blobs/sha256:29e742eddcdd1692ec3bc8ca847210bf9c7df47fc17f710b0924d412b92e8614?se=2022-07-19T08%3A30%3A00Z&sig=3HlrkSs9K96cBoYJgkAzI8KXAKuvGVHaITxEED4x5Qs%3D&sp=r&spr=https&sr=b&sv=2019-12-12
==> Pouring gitlab-runner--15.1.0.arm64_monterey.bottle.tar.gz
==> Caveats
To restart gitlab-runner after an upgrade:
  brew services restart gitlab-runner
Or, if you don't want/need a background service you can just run:
  /opt/homebrew/opt/gitlab-runner/bin/gitlab-runner run --syslog
==> Summary
🍺  /opt/homebrew/Cellar/gitlab-runner/15.1.0: 9 files, 60.4MB
==> Running `brew cleanup gitlab-runner`...
Disable this behaviour by setting HOMEBREW_NO_INSTALL_CLEANUP.
Hide these hints with HOMEBREW_NO_ENV_HINTS (see `man brew`).
Installed gitlab-runner
Registering gitlab-runner...
Registered gitlab-runner
Creating startup script...
Created statup script
Starting gitlab-runner...
Starting gitlab-runner process...
Started gitlab-runner process
Started gitlab-runner
Adding group ec2-macos-init...
Added group to ec2-macos-init
Finished userdata script
] and stderr: [Running `brew update --auto-update`...
==> Homebrew is run entirely by unpaid volunteers. Please consider donating:
  https://github.com/Homebrew/brew#donations

==> Auto-updated Homebrew!
Updated 3 taps (homebrew/core, homebrew/cask and aws/aws).
==> New Formulae
aztfy
bfgminer
cargo-bundle
cargo-nextest
cargo-udeps
circumflex
czg
dart-sdk
datatype99
docker-buildx
doggo
dooit
dumpling
dunamai
editorconfig-checker
evernote-backup
git-codereview
glib-utils
gnustep-base
helix
ijq
interface99
iptables
kt-connect
leapp-cli
levant
lexicon
lgeneral
libnetfilter_conntrack
libnftnl
libnl
libobjc2
libpython-tabulate
licensor
livekit
livekit-cli
llvm@13
lunar-date
mabel
manifest-tool
meek
metalang99
mkp224o
mle
mprocs
mypaint-brushes
neovide
nftables
ocl-icd
pax
pipe-rename
pixie
pixiewps
podman-compose
protobuf@3
python-build
qbe
qsv
railway
sgn
snowflake
tea
tremor-runtime
trzsz-go
ttdl
unisonlang
uthash
vectorscan
verapdf
webkitgtk
xdg-ninja
==> New Casks
archy
astrofox
bing-wallpaper
black-light
black-light-pro
cardinal
cleaneronepro
cloud189
cro-mag-rally
fertigt-slate
gama
gama-jdk
gamma-control
gyroflow
headlamp
juice
lemonlime
mbcord
opencore-patcher
plex-htpc
rockboxutility
squash
tdr-kotelnikov
tdr-nova
tdr-vos-slickeq
tmpdisk
ukrainian-typographic-keyboard
weektodo
wirecast
yousician

You have 5 outdated formulae installed.
You can upgrade them with brew upgrade
or list them with brew outdated.

Runtime platform                                    arch=arm64 os=darwin pid=1933 revision=76984217 version=15.1.0
WARNING: Running in user-mode.
WARNING: Use sudo for system-mode:
WARNING: $ sudo gitlab-runner...

Registering runner... succeeded                     runner=GR1348941HMXHduHd
Runner registered successfully. Feel free to start it, but if it's running already the config should be automatically reloaded!
]
2022/07/19 01:23:38.354026 Successfully completed processing of priority level 5
2022/07/19 01:23:38.354060 Writing instance history for instance i-07d4615d1be0c53ed...
2022/07/19 01:23:38.359768 Successfully wrote instance history
2022/07/19 01:23:38.359842 EC2 macOS Init completed in 1m41.332787209s

And the process is running:

ec2-user@ip-10-211-190-57 ~ % ps -ef | grep gitlab
  501  1944     1   0  8:23am ??         0:11.49 /opt/homebrew/bin/gitlab-runner run
  501  2581  2535   0  9:20am ttys000    0:00.00 grep gitlab
sourcedelica commented 1 year ago

Hi @rj93 - what did you end up doing? I'm trying to do the same thing (setup gitlab-runner to run at boot on a m2.metal instance) and ran into a different set of issues (trying to use gitlab-runner install which creates a launchd user agent which fails to start because there is no user session).

It is likely you'll need a separate Launchd Daemon instance that launches and owns the GitLab CI runner process (set up with your flags and environment as needed).

I'm confused about what is being recommended. I created a launch daemon in /Library/LaunchDaemons that was basically the same contents of what gitlab-runner install created, plus UserName ec2-user. This seems to work. I'm assuming that the only issue with daemonization is running gitlab-runner via ec2-macos-init. That running it as a regular launchd daemon is fine (lack of Gitlab support notwithstanding). Does that sound right?

Thanks!

jahkeup commented 1 year ago

I created a launch daemon in /Library/LaunchDaemons that was basically the same contents of what gitlab-runner install created, plus UserName ec2-user.

Ah, sorry! You're right, yes, we are recommending to create a job - using user data - and then loading it to run under launchd. And yeah, that lack of support seems to me because of the particulars of creating and managing the LaunchDaemon and/or LaunchAgent. (to @sourcedelica , skip to the bottom of this reply then come back here!)

For general needs, it could be simple enough to run a LaunchDaemon in the system domain but then your gitlab-runner could be missing out on some services.. Also see recent PR to configure the LaunchAgent for Actions Runner tuning exactly this - these are cases of dealing with launchd sessions and the services available to them, if I'm reading things right.

This means that a launchd job, say gitlab-runner, may not work for your CI because a subset of services are available to that CI run when running outside of a gui/$uid session (eg: the Dock, appropriate in a GUI session; some security framework services). However, there are use cases that don't need the full graphical session. In other words, it depends 😅 - it depends on what you (and your invoked tools) need.

And, yeah, that means we're going to have to work with launchd jobs.

Some relevant links (n.b. in archive docs):


(maybe this works for you @sourcedelica ?)

There are some convenience service helpers in Homebrew for formulas to use. This works for formulas that define a service - and this is set gitlab-runner formula!

Per https://docs.brew.sh/Manpage#services-subcommand something like this should do the trick:

#!/usr/bin/env sh
logger -s -t userdata "installing gitlab-runner"
brew install gitlab-runner
logger -s -t userdata "enabling, starting gitlab-runner service"
# re: "sudo" - see https://docs.brew.sh/Manpage#services-subcommand
sudo brew services start gitlab-runner
sourcedelica commented 1 year ago

I ended up getting the Gitlab runner-installed LaunchAgent working through some hackery. Basically this was:

  1. Set ec2-user as the auto-login user
  2. Set the user password and the auto-login password to the same value
  3. Reboot

When the machine reboots (and every boot thereafter) the ec2-user is auto-logged in and now the LaunchAgent associated with ec2-user works. Here are the last few commands in the user-data script:

plutil -replace autoLoginUser -string ${user} /Library/Preferences/com.apple.loginwindow.plist
sudo dscl . -passwd /Users/ec2-user "${pw}"
python3 set_kcpassword.py "${pw}"
reboot

Find set_kcpassword.py here