ANTS-Framework / ants

ANTS is a framework to manage and apply macOS and Linux host configurations using Ansible Pull.
GNU General Public License v3.0
41 stars 8 forks source link

Ants log file should update in real time #17

Open knightsc opened 6 years ago

knightsc commented 6 years ago

After the ANTS Framework is first installed I touch the run now file to kick off an initial run. I then go and tail /var/log/ants/ants.log and get the following:

Jul 19 2018 14:29:32 EDT    Starting Ansible Pull at 2018-07-19 14:29:32
Jul 19 2018 14:29:32 EDT    /Library/ANTS-Framework/bin/ansible-pull --clean -f -i /var/root/.ants_playbook/hosts -d /var/root/.ants_playbook -U repo.git -C branch site.yml --private-key /etc/ants/id_ants

And the log just stays like that 13 minutes. My playbook uses the macos_profile ANTs role and during this time I can see the profiles installed one by one in system preferences. Finally after the run is finished the ants.log file show the output for the rest of the run.

Jul 19 2018 14:43:15 EDT    <host> | SUCCESS => {
Jul 19 2018 14:43:15 EDT        "after": "27a58dd9f5545a072be45d0d496f8d4b8d90e28d",
Jul 19 2018 14:43:15 EDT        "before": "27a58dd9f5545a072be45d0d496f8d4b8d90e28d",
Jul 19 2018 14:43:15 EDT        "changed": false,
Jul 19 2018 14:43:15 EDT        "remote_url_changed": false
Jul 19 2018 14:43:15 EDT    }

I briefly discussed this in the #ansible channel in MacAdmins slack here: https://macadmins.slack.com/archives/C06G3E39P/p1528319226000036

I didn't get a chance to look into it in more detail then so creating this issue so I can take another look.

JanWelker commented 6 years ago

Hi @knightsc ,

thanks for your detailed problem description. Ansible also counts the random delay, we are using to prevent DDoS behavior of the AD and GIT infrastructure, when it measures the run-time. This might explain your long run-time. We add a random delay with 15min max when running as launch daemon. We do not set the random delay if you trigger ANTS on the CLI.

The reason you only see the output in the log after each task finished is due to the way we write the log. I will discuss this with @aschwanb. There might be a nicer way to do this which will result in an "realtime" log.

Cheers, Jan

aschwanb commented 6 years ago

I hope you're using tail -f but joking aside.

From the output of your log file it looks to me as if the run triggered by ants.run-now is only logging the beginning of the run and ants -w is logging the rest. This would explain why there is no change in the git checkout.

For context: Loading both LaunchDaemons will result in triggering ants -w. Touching the file at /Users/Shared/.ch.unibas.its.cs.ants.run-now will trigger ants without delay.

There we have a situation where both processes are writing to the same log files. We're using the python logging library and I've to test if there is a potential problem when two processes are using the same file.

Am I correct in assuming that you're loading both LaunchDaemons after the pkg is installed and you're not doing a restart? If so, I'll test that scenario.

aschwanb commented 6 years ago

I did some basic testing on macOS 10.13.6: Open three terminal windows. One with sudo ants -w another with sudo ants and a third with tail -f /var/log/ants/ants.log. The log shows the output of sudo ants.

@knightsc Can you run the same test and confirm my results? I think we might need the detailed steps on how to reproduce your issue.

aschwanb commented 6 years ago

I also ran touch /Users/Shared/.ch.unibas.its.cs.ants.run-now && tail -f /var/log/ants/ants.log and could see the log in "real time" - with the delay of the python subprocess output, which is still less than the 13 minutes you experienced.

I think there might be a problem during the bootstrapping procedure when ants is just installed.

knightsc commented 6 years ago

I am definitely using tail -f. I will repeat some of my testing on a real machine rather than a virtual machine to see if that makes any difference

aschwanb commented 6 years ago

I just did the same test as above in my 10.14 VM and coudn't reproduce the behaviour. Can you check if the issue is with the run-now LaunchDaemon?

knightsc commented 6 years ago

I just tested touch /Users/Shared/.ch.unibas.its.cs.ants.run-now && tail -f /var/log/ants/ants.log on my physical 10.13.6 machine and see the same behavior. Not as long as a delay as the VM. But the start of the run has the timestamp Jul 26 2018 08:58:57 EDT and then every single line of the output log has the timestamp Jul 26 2018 08:59:51 EDT. It just doesn't seem possible for everything to be happening at the exact same time. In your log files are there different timestamps on the out?

I also test my playbook changes locally using a regular playbook command like this:

sudo ansible-playbook -i hosts -c local -lhostnamesite.yml

And when I do the output scrolls by like I expect on the console.

When you say check if the issue is with the run-now launch daemon what should I be looking for?

knightsc commented 6 years ago

This is the plist for the run-now daemon

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
    <dict>
        <key>KeepAlive</key>
        <dict>
            <key>PathState</key>
            <dict>
                <key>/Users/Shared/.ch.unibas.its.cs.ants.run-now</key>
                <true/>
            </dict>
        </dict>
        <key>Label</key>
        <string>ch.unibas.its.cs.ants.run-now</string>
        <key>OnDemand</key>
        <true/>
        <key>ProgramArguments</key>
        <array>
            <string>bash</string>
            <string>-c</string>
            <string>/Library/ANTS-Framework/bin/ants; /bin/rm /Users/Shared/.ch.unibas.its.cs.ants.run-now</string>
        </array>
        <key>StandardErrorPath</key>
        <string>/var/log/system.log</string>
        <key>StandardOutPath</key>
        <string>/var/log/system.log</string>
    </dict>
</plist>

Interestingly enough it looks like the output from a run also gets put into /var/log/system.log (my guess because of the daemon) and even in that file I see all the info show up at once at the end of the running rather than all throughout the run

aschwanb commented 6 years ago

@knightsc Yes, this is the daemon I was refering to. I could have given the full path, sorry.

When you first opened the issue, I thought it was about a delay of 13 minutes for an ants run triggered by touching /Users/Shared/.ch.unibas.its.cs.ants.run-now. This is very different from logs all having the same timestamp but occuring within one minute after the start of the run. Is the original issue still present or was it about the timestampes all along?

knightsc commented 6 years ago

Original issue is there. Definitely after a trigger a run, if I tail the ants.log i get no output until the run is completely finished and all entries show at once. I think they all have the same timestamp because they're all getting printed at once at the end rather than being printed as the run progresses and having different timestamp values

@aschwanb If you check the logs of one of your runs is all the ansible output with the same timestamp?

aschwanb commented 6 years ago

They don't all have the same timestamp. But there are definitely blocks of output. Eg. one block where all lines have timestap x, then a block with timestamp y. I think the reason for this is the way python buffers and outputs lines from subprocess.Popen.

The reason we're using subprocess instead of importing ansible directly is here. The relevant code starts here:

proc = subprocess.Popen(cmd, bufsize=1, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

and

for line in iter(proc.stdout.readline, ''):
    logger.write_log(line, task_line)

You can maybe take a look at the bufsize options and see if you can improve on the output speed.