Unit tests fail for 2.6.3 in Python3.2

ghost commented 9 years ago

I just ran the unit tests for 2.6.3 (using Python 3.2) and the following two unit tests fail:

Running test_split_subdivide_checkpointing.py
     Run pipeline normally...
     Check that running again does nothing. (All up to date).
     Running again with forced tasks to generate more files...
     Check that running again does nothing. (All up to date).
     Running again with forced tasks to generate even more files...
E     Run pipeline normally...
     Check that running again does nothing. (All up to date).
     Running again with forced tasks to generate more files...
     Check that running again does nothing. (All up to date).
     Running again with forced tasks to generate even more files...
E
======================================================================
ERROR: test_newstyle_ruffus (test_split_subdivide_checkpointing.Test_ruffus)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_split_subdivide_checkpointing.py", line 181, in test_newstyle_ruffus
    expected_files_after_4_runs)
  File "test_split_subdivide_checkpointing.py", line 136, in check_file_exists_or_not_as_expected
    raise Exception("Expected file %s" % (tempdir + ee))
Exception: Expected file test_split_subdivide_checkpointing/2.split

======================================================================
ERROR: test_ruffus (test_split_subdivide_checkpointing.Test_ruffus)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_split_subdivide_checkpointing.py", line 225, in test_ruffus
    expected_files_after_4_runs)
  File "test_split_subdivide_checkpointing.py", line 136, in check_file_exists_or_not_as_expected
    raise Exception("Expected file %s" % (tempdir + ee))
Exception: Expected file test_split_subdivide_checkpointing/2.split

----------------------------------------------------------------------
Ran 2 tests in 5.875s

FAILED (errors=2)

After following this "hint" ;-) in the code...

def tearDown(self):
        # only tear down if not throw exception so we can debug?
        try:
            shutil.rmtree(tempdir)
        except:
            pass

I can confirm that some of the files are created as expected (well, obviously):

0.split
0_0.subdivided
0_1.subdivided
1.split
1_0.subdivided
start

However, right now I fail to see where the whole thing breaks... Please let me know if you need more information. Best Peter

bunbun commented 9 years ago

Dear Peter, This is very strange and rather worrying: The test runs perfectly on python3.2 on my box and in Travis-CI.

Would you mind rerunning the tests but increasing the verbosity to 6 for the last two calls to pipeline_run() to show what is happening? These are on lines 221 and 227. Change:

    pipeline_run(forcedtorun_tasks = [make_start], multiprocess = 10, verbose=0, pipeline= "main")
    ...
    pipeline_run(multiprocess = 10, verbose=0, pipeline= "main")

to

    pipeline_run(forcedtorun_tasks = [make_start], multiprocess = 10, verbose=6, pipeline= "main")
    ...
    pipeline_run(multiprocess = 10, verbose=6, pipeline= "main")

ghost commented 9 years ago

Hi Leo, sure, below is the (lengthy) output. Is it possible that this is caused by something in the "environment"; e.g., I was thinking about the file system and added a time.sleep(1) after each file creation, but the test is still failing. Best Peter

Running test_split_subdivide_checkpointing.py
     Run pipeline normally...
_______________________________________
Tasks which will be run:

Task enters queue = 'test::test_split_subdivide_checkpointing.make_start' 
    -> start
    Job  = [None -> test_split_subdivide_checkpointing/start] ...
        Missing file [test_split_subdivide_checkpointing/start] 
    Job  = [None -> test_split_subdivide_checkpointing/start] completed
Completed Task = 'test::test_split_subdivide_checkpointing.make_start' 
Task enters queue = 'test::test_split_subdivide_checkpointing.split_start' 
    -> XXX.split where XXX = 0 .. N, N   = previous N + 1
    Job  = [[test_split_subdivide_checkpointing/start] -> test_split_subdivide_checkpointing/*.split] Missing output file 
    Job  = [[test_split_subdivide_checkpointing/start] -> test_split_subdivide_checkpointing/*.split] completed
Completed Task = 'test::test_split_subdivide_checkpointing.split_start' 
Task enters queue = 'test::test_split_subdivide_checkpointing.subdivide_start' 
    Job  = [test_split_subdivide_checkpointing/0.split -> test_split_subdivide_checkpointing/0_*.subdivided, test_split_subdivide_checkpointing/0] Missing output file 
    Job  = [test_split_subdivide_checkpointing/0.split -> test_split_subdivide_checkpointing/0_*.subdivided, test_split_subdivide_checkpointing/0] completed
Completed Task = 'test::test_split_subdivide_checkpointing.subdivide_start' 
     Check that running again does nothing. (All up to date).
________________________________________
Tasks which are up-to-date:

Task = 'test::test_split_subdivide_checkpointing.make_start'  
    "-> start"
       Job  = [None
             -> test_split_subdivide_checkpointing/start]

Task = 'test::test_split_subdivide_checkpointing.split_start'  
    "-> XXX.split where XXX = 0 .. N, N   = previous N + 1"
       Job  = [[test_split_subdivide_checkpointing/start]
             -> test_split_subdivide_checkpointing/*.split]

Task = 'test::test_split_subdivide_checkpointing.subdivide_start'  
       Job  = [test_split_subdivide_checkpointing/0.split
             -> test_split_subdivide_checkpointing/0_*.subdivided
              , test_split_subdivide_checkpointing/0]
________________________________________

________________________________________
Tasks which will be run:
     Running again with forced tasks to generate more files...
________________________________________
Tasks which will be run:

Task enters queue = 'test::test_split_subdivide_checkpointing.make_start' : Forced to rerun
    -> start
    force task Job  = [None -> test_split_subdivide_checkpointing/start] to rerun 
    Job  = [None -> test_split_subdivide_checkpointing/start] completed
Completed Task = 'test::test_split_subdivide_checkpointing.make_start' 
Task enters queue = 'test::test_split_subdivide_checkpointing.split_start' 
    -> XXX.split where XXX = 0 .. N, N   = previous N + 1
    Job  = [[test_split_subdivide_checkpointing/start] -> test_split_subdivide_checkpointing/*.split] unnecessary: already up to date 
Uptodate Task = 'test::test_split_subdivide_checkpointing.split_start'
Task enters queue = 'test::test_split_subdivide_checkpointing.subdivide_start' 
    Job  = [test_split_subdivide_checkpointing/0.split -> test_split_subdivide_checkpointing/0_*.subdivided, test_split_subdivide_checkpointing/0] unnecessary: already up to date 
Uptodate Task = 'test::test_split_subdivide_checkpointing.subdivide_start'
test_split_subdivide_checkpointing/
E     Run pipeline normally...
________________________________________
Tasks which will be run:

Task enters queue = 'test_split_subdivide_checkpointing.make_start' 
    -> start
    Job  = [None -> test_split_subdivide_checkpointing/start] ...
        Missing file [test_split_subdivide_checkpointing/start] 
    Job  = [None -> test_split_subdivide_checkpointing/start] completed
Completed Task = 'test_split_subdivide_checkpointing.make_start' 
Task enters queue = 'test_split_subdivide_checkpointing.split_start' 
    -> XXX.split where XXX = 0 .. N, N   = previous N + 1
    Job  = [[test_split_subdivide_checkpointing/start] -> test_split_subdivide_checkpointing/*.split] Missing output file 
    Job  = [[test_split_subdivide_checkpointing/start] -> test_split_subdivide_checkpointing/*.split] completed
Completed Task = 'test_split_subdivide_checkpointing.split_start' 
Task enters queue = 'test_split_subdivide_checkpointing.subdivide_start' 
    Job  = [test_split_subdivide_checkpointing/0.split -> test_split_subdivide_checkpointing/0_*.subdivided, test_split_subdivide_checkpointing/0] Missing output file 
    Job  = [test_split_subdivide_checkpointing/0.split -> test_split_subdivide_checkpointing/0_*.subdivided, test_split_subdivide_checkpointing/0] completed
Completed Task = 'test_split_subdivide_checkpointing.subdivide_start' 
     Check that running again does nothing. (All up to date).
________________________________________
Tasks which are up-to-date:

Task = 'test_split_subdivide_checkpointing.make_start'  
    "-> start"
       Job  = [None
             -> test_split_subdivide_checkpointing/start]

Task = 'test_split_subdivide_checkpointing.split_start'  
    "-> XXX.split where XXX = 0 .. N, N   = previous N + 1"
       Job  = [[test_split_subdivide_checkpointing/start]
             -> test_split_subdivide_checkpointing/*.split]

Task = 'test_split_subdivide_checkpointing.subdivide_start'  
       Job  = [test_split_subdivide_checkpointing/0.split
             -> test_split_subdivide_checkpointing/0_*.subdivided
              , test_split_subdivide_checkpointing/0]

____________________________________
________________________________________
Tasks which will be run:
     Running again with forced tasks to generate more files...
________________________________________
Tasks which will be run:

Task enters queue = 'test_split_subdivide_checkpointing.make_start' : Forced to rerun
    -> start
    force task Job  = [None -> test_split_subdivide_checkpointing/start] to rerun 
    Job  = [None -> test_split_subdivide_checkpointing/start] completed
Completed Task = 'test_split_subdivide_checkpointing.make_start' 
Task enters queue = 'test_split_subdivide_checkpointing.split_start' 
    -> XXX.split where XXX = 0 .. N, N   = previous N + 1
    Job  = [[test_split_subdivide_checkpointing/start] -> test_split_subdivide_checkpointing/*.split] unnecessary: already up to date 
Uptodate Task = 'test_split_subdivide_checkpointing.split_start'
Task enters queue = 'test_split_subdivide_checkpointing.subdivide_start' 
    Job  = [test_split_subdivide_checkpointing/0.split -> test_split_subdivide_checkpointing/0_*.subdivided, test_split_subdivide_checkpointing/0] unnecessary: already up to date 
Uptodate Task = 'test_split_subdivide_checkpointing.subdivide_start'
test_split_subdivide_checkpointing/
E
======================================================================
ERROR: test_newstyle_ruffus (test_split_subdivide_checkpointing.Test_ruffus)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_split_subdivide_checkpointing.py", line 171, in test_newstyle_ruffus
    expected_files_after_3_runs)
  File "test_split_subdivide_checkpointing.py", line 140, in check_file_exists_or_not_as_expected
    raise Exception("Expected file %s" % (tempdir + ee))
Exception: Expected file test_split_subdivide_checkpointing/1.split
======================================================================
ERROR: test_ruffus (test_split_subdivide_checkpointing.Test_ruffus)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_split_subdivide_checkpointing.py", line 215, in test_ruffus
    expected_files_after_3_runs)
  File "test_split_subdivide_checkpointing.py", line 140, in check_file_exists_or_not_as_expected
    raise Exception("Expected file %s" % (tempdir + ee))
Exception: Expected file test_split_subdivide_checkpointing/1.split
----------------------------------------------------------------------
Ran 2 tests in 10.950s

FAILED (errors=2)

bunbun commented 9 years ago

Something very strange seems to be happening. For some strange reason make_start() is not updating its output file times. This may be just because of a combination of interactions between timestamping and poor file system time granularities.

I have updated Ruffus to add (even more) status messages for up to date tasks.

Could you do a git pull and run the unittest with an extra "-v" on the command line and send me the output?

python test/test_split_subdivide_checkpointing.py -v

Thanks

Leo

ghost commented 9 years ago

Sorry that it took me so long to re-run the tests - busy days... I ran the checkout version 1202 of the test on three different file systems (I can send you the output via mail if you want, it's a bit too long to post it here). Bottom line:

regular NFS share: fails
high performance cluster file system: fails (this was unexpected...)
local disk (/tmp on one of the cluster nodes): success

So, right now, it appears to be related to high latency of the file system; however, I added again a time.sleep(1) after each file creation step and re-tested on our cluster file system, it still fails (there is no load on the file servers as I write these lines). This was surprising, so I increased from 1 second to 3 seconds, yet the tests are still failing. That leaves me puzzled; is adding time.sleep not the right way of going after this? Or should I talk to our IT guys, because the latency of our cluster file system is expected to be lower than 1s when there is no high load? So I guess you can close this as "not a Ruffus issue", but I would still like to hear your opinion on that. Thanks

Peter

bunbun commented 9 years ago

Dear Peter, It is obviously a ruffus issue if Ruffus is not working properly :-) The thing that we have to work out is what the latency means 1) The created files are not appearing immediately 2) The created files do not have the correct file times 3) The timestamps have extremely poor resolution 4) Your cluster head node system time is out of sync with that of the disk servers. Could you send me the full output via mail please? My email address is ruffus at llew.org.uk

Thanks

Leo

cgat-developers / ruffus

Unit tests fail for 2.6.3 in Python3.2 #49