appcanary / agent

The Appcanary agent
https://www.appcanary.com
GNU General Public License v3.0
47 stars 6 forks source link

weird timing issue with file_test.go / remove tmpfile debugging #31

Closed phillmv closed 9 years ago

phillmv commented 9 years ago

About 20% of the time TestWatchFileHookLoop will just hang until the go test utility kills the test because of a timeout.

The remainder of the time it works as intended.

Current hypothesis

The timer in the inner go routine won't fire For Reasons.

That's as far as I've been able to determine; the test will run, all of the right files get written to, the callbacks fire - except for the timers. I've narrowed down on the timers because, on the trial runs where the test fails, the debug "print" statements near the timer code never show up.

On successful runs, the output file becomes filled with "microloop" print statements.

If on the failing runs the loop never writes to invokedCount, that would explain why the test hangs and never exits.

(Not only that, but hanging tests won't print to stdout, thus forcing me to resource to all the write calls to /tmp/wtf as a means of getting debugging values)

reproduce

I can reproduce this some of the time by running the test:

make testr t=WatchFileHookLoop

and in a separate terminal tab,

touch /tmp/wtf && tail -f /tmp/wtf

Once this gets cleaned up, it'd be real smart to get rid of all of the /tmp/wtf calls.

phillmv commented 9 years ago

Ya sorry could've been more obvious

On May 19, 2015, at 19:17, Andrey Petrov notifications@github.com wrote:

Ooooh WIP branch, nm.

— Reply to this email directly or view it on GitHub.

shazow commented 9 years ago

I don't think you need the goroutine here.

I'd replace the testcb with something like:

    counter := 0
    testcb := func(_ *WatchedFile) {
        counter++
        cbInvoked <- true
    } 

It's technically a race condition but you can wrap a lock around it later if you want. Or maybe make the counter a native part of the WatchedFile struct, seems like a useful metric to keep.

I think the amount of complexity you're adding here by chaining channels is obfuscating the problem (or even causing the problem).

phillmv commented 9 years ago

Problem turned out to be an unbuffered channel allied to a mysterious extra write; that 20% of the time the program gets an extra write, causing the counter to move to 7.

Because the channel was unbuffered, during these mystery writes we end up hanging the program.

The cause of the hang established, we replaced it with your loop ;)