CoBrALab / qbatch

The Unlicense
27 stars 13 forks source link

Convert test equality to sets to allow out-of-order execution #131

Closed gdevenyi closed 7 years ago

gdevenyi commented 8 years ago

Convert tests to use sets.

Will manually re-run travis a few times to see if this fixes the random errors.

gdevenyi commented 8 years ago

Hrm, I'm not getting the failure mode here. By eye the outputs look the same. Plus now I'm using set equality, though I suspect the set isn't getting constructed correctly by breaking on \n

gdevenyi commented 8 years ago

Blah, splitting didn't fix it :(

Still works locally as well

pipitone commented 8 years ago

For whatever reason, one command output is being dropped, e.g:

AssertionError: Chunk 3: Expected echo 20   20

echo 21 21

echo 22 22

echo 23 23

echo 24 24

echo 25 25

echo 26 26

echo 27 27 <<---- this guy is missing

echo 28 28

echo 29 29

 but got echo 20    20

echo 21 21

echo 22 22

echo 23 23

echo 24 24

echo 25 25

echo 26 26

echo 28 28

echo 29 29
gdevenyi commented 8 years ago

Hrm, good catch, too bad I have no idea why it misses it :-/

pipitone commented 8 years ago

Yes, and why it doesn't miss it locally.

One thing we can do to simplify the code and work around the string decoding issues is to default subprocess to decoding, via:

 def command_pipe(command):
-    return Popen(shlex.split(command), stdin=PIPE, stdout=PIPE, stderr=PIPE)
+    return Popen(shlex.split(command), stdin=PIPE, stdout=PIPE, stderr=PIPE, 
+            universal_newlines=True)
pipitone commented 8 years ago

Well, I've been playing with rerunning the travis build on my own branch of this PR (https://github.com/pipitone/qbatch/tree/set-comparison-test). It seems to fail about <25% of the time and I haven't been able to replicate it locally, but when I remove --line-buffer from the parallel command, I don't seem to get failures...

gdevenyi commented 8 years ago

I wonder if it's a bug in parallel.

Perhaps we should pull down the latest version to check? It's just an untar to install.

pipitone commented 8 years ago

Yeah... I reproduced the travis environment: ubuntu 14.04 with parallel 20130922 and I get the same failure when using --line-buffer, and also when I upgrade to parallel 20140122 (same version shipped with 16.04) although it maybe happens less frequently? Interestingly, I'm not seeing the failure using 20160822 on 14.04.

blah.

pipitone commented 8 years ago

Scratch that. Just got a failure on 14.04 with 20160822.

gdevenyi commented 8 years ago

I'm starting to wonder if this is an interaction problem between .communicate and parallel.

It seems people sometimes have issues with Popen/communicate and missing lines...

pipitone commented 8 years ago

Do you have some pointers to where this problem is discussed?

On Sep 1, 2016, at 11:20 AM, "Gabriel A. Devenyi" notifications@github.com wrote:

I'm starting to wonder if this is an interaction problem between .communicate and parallel.

It seems people sometimes have issues with Popen/communicate and missing lines...

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

gdevenyi commented 8 years ago

https://stackoverflow.com/questions/2432556/unbuffered-subprocess-output-last-line-missing

https://stackoverflow.com/questions/10689998/python-subprocess-losing-10-of-a-programs-stdout

pipitone commented 8 years ago

Hmm.. interesting. I'll take a closer look. I wonder what changed in 16.04... ;-)

How badly do you want --line-buffer? It still seems to me that if you want "live" output from your commands, you can just handle redirection to a log file yourself (not friendly, but also not difficult).

gdevenyi commented 8 years ago

I feel pretty strongly on this.

Right now, common usage of qbatch the way its intended results in empty log files if the job runs into batch system issues. This is very user unfriendly.

pipitone commented 8 years ago

Fair point. I'm not sure what to do really. I'll keep investigating.

As a stop-gap, we could use --files or --results to get parallel to dump to files for us rather than a single shared log file. Thoughts?

pipitone commented 8 years ago

@gdevenyi what do you think about using --files or --results with parallel so that it dumps individual log files? I'd vote doing that, or just leaving it up to the users to write their own output redirection if they really want realtime output.

gdevenyi commented 7 years ago

After much testing I have determined that the problem is that GNU parallel sometimes loses lines in --line-buffer.

Tested in the latest version and it looks like it's still there. Going to attempt to report a bug.