CGATOxford / cgat

Do not use - please refer to our newest code: https://github.com/cgat-developers/cgat-apps
BSD 3-Clause "New" or "Revised" License
124 stars 66 forks source link

pipeline_testing - #341

Closed Charlie-George closed 7 years ago

Charlie-George commented 7 years ago

I'm testing the peakcalling pipeline with the py3 environment, each pipeline seems to run individually now and I've pushed those changes, but I get the following error when it comes to the checksums - has anyone else come across it before? @sebastian-luna-valero @AndreasHeger ``

2017-06-27 19:14:02,518 INFO running statement:

cat test_peakcallingSEbroad.stats | cgat csv2db --retry --database-backend=sqlite --database-name=csvdb --database-host= --database-user= --database-password= --database-port=3306 --add-index=file --table=test_peakcallingSEbroad_results > test_peakcallingSEbroad_results.load

2017-06-27 19:14:11,261 ERROR 1 tasks with errors, please see summary below:

2017-06-27 19:14:11,261 WARNING could not get task information for compareCheckSums, no message sent

2017-06-27 19:14:11,262 ERROR 0: Task=compareCheckSums Error=io.UnsupportedOperation Job=[[test_peakcallingPEnarrow.stats,test_peakcallingPEnarrowIDR.stats,test_peakcallingPEnarrowIDRoracle.stats,test_peakcallingSEIDR.stats,test_peakcallingSEbroad.stats]->md5_compare.tsv]: (can't do nonzero end-relative seeks)

2017-06-27 19:14:11,262 ERROR full traceback is in pipeline.log

Traceback (most recent call last): File "/ifs/devel/charlotteg/py35-v1/CGATPipelines/CGATPipelines/Pipeline/Control.py", line 943, in main checksum_level=options.ruffus_checksums_level, File "/ifs/devel/charlotteg/py35-v1/conda/lib/python3.5/site-packages/ruffus/task.py", line 5938, in pipeline_run raise job_errors ruffus.ruffus_exceptions.RethrownJobError:

Original exception:

Exception #1
  'io.UnsupportedOperation(can't do nonzero end-relative seeks)' raised in ...
   Task = def compareCheckSums(...):
   Job  = [[test_peakcallingPEnarrow.stats, test_peakcallingPEnarrowIDR.stats, test_peakcallingPEnarrowIDRoracle.stats, test_peakcallingSEIDR.stats, test_peakcallingSEbroad.stats] -> md5_compare.tsv]

Traceback (most recent call last):
  File "/ifs/devel/charlotteg/py35-v1/conda/lib/python3.5/site-packages/ruffus/task.py", line 751, in run_pooled_job_without_exceptions
    register_cleanup, touch_files_only)
  File "/ifs/devel/charlotteg/py35-v1/conda/lib/python3.5/site-packages/ruffus/task.py", line 567, in job_wrapper_io_files
    ret_val = user_defined_work_func(*params)
  File "/ifs/devel/charlotteg/py35-v1/CGATPipelines/CGATPipelines/pipeline_testing.py", line 467, in compareCheckSums
    is_complete = IOTools.isComplete(logfile)
  File "/ifs/devel/charlotteg/py35-v1/cgat/CGAT/IOTools.py", line 181, in isComplete
    lastline = getLastLine(filename)
  File "/ifs/devel/charlotteg/py35-v1/cgat/CGAT/IOTools.py", line 103, in getLastLine
    f.seek(-1 * offset, 2)
io.UnsupportedOperation: can't do nonzero end-relative seeks

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/ifs/devel/charlotteg/py35-v1/CGATPipelines/CGATPipelines/pipeline_testing.py", line 656, in sys.exit(P.main(sys.argv)) File "/ifs/devel/charlotteg/py35-v1/CGATPipelines/CGATPipelines/Pipeline/Control.py", line 1028, in main "pipeline failed with %i errors" % len(value.args)) ValueError: pipeline failed with 1 errors

``

AndreasHeger commented 7 years ago

This is a py3 issue, I have a fix for this that I need to push.

AndreasHeger commented 7 years ago

... actually already pushed, could you please git pull --rebase? Hopefully this will be fixed.

Charlie-George commented 7 years ago

hmm I've done that but it says I'm up to date, I guess there has been some confusion when we merged with master? Should I roll back? if so to which commit, I'm a bit confused with the history and at what point fixes have dissappeared. Thanks

sebastian-luna-valero commented 7 years ago

Hi Charlie,

I agree, I could not see Andreas' fixes into the Py3-migration branches: https://github.com/CGATOxford/cgat/commits/Py3-migration https://github.com/CGATOxford/CGATPipelines/commits/Py3-migration

I found the same problem with Jenkins. I think the issue is with pipeline_testing.py trying to access a file (test_name.log) while the pipeline itself is writing to it, and therefore you get an IO error.

However, I might be wrong and Andreas can explain better.

Best regards, Sebastian

AndreasHeger commented 7 years ago

Hi, sorry about that.

If I recall, the next() needs to be replaced by readline().

The issue was that in py3 the file is an IOBuffer or similar and that does not have a next() method.

Best wishes, Andreas

On 28/06/17 10:31, Sebastian Luna-Valero wrote:

Hi Charlie,

I agree, I could not see Andreas' fixes into the Py3-migration branches: https://github.com/CGATOxford/cgat/commits/Py3-migration https://github.com/CGATOxford/CGATPipelines/commits/Py3-migration

I found the same problem with Jenkins. I think the issue is with pipeline_testing.py https://github.com/CGATOxford/CGATPipelines/blob/Py3-migration/CGATPipelines/pipeline_testing.py#L467 trying to access https://github.com/CGATOxford/cgat/blob/Py3-migration/CGAT/IOTools.py#L103 a file (test_name.log) while the pipeline itself is writing to it, and therefore you get an IO error.

However, I might be wrong and Andreas can explain better.

Best regards, Sebastian

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/CGATOxford/cgat/issues/341#issuecomment-311607669, or mute the thread https://github.com/notifications/unsubscribe-auth/AEOKYJfkrxG_zx2-cAiI5d4rRMNjSkYqks5sIh1ogaJpZM4OHA_e.

sebastian-luna-valero commented 7 years ago

Thanks, Andreas.

I think there is an additional issue. The isComplete function will check whether the last line of both test_name.log and test_name/test_name.log starts with # job finished. However, in the case of test_name.log that will never be the case in the compareCheckSums task of pipeline_testing.py since the (meta-)pipeline has not finished yet. Instead, you should be checking the test_name/test_name.log file only, which is the log file for the pipeline being tested.

Best regards, Sebastian

AndreasHeger commented 7 years ago

Hi @sebastian-luna-valero , might be a bug, but note that I want to test ./testname.log instead of test_name/pipleline.log as the latter will also contain the log of the report building.

There is also the issue to test several logs if there are multiple targets to be tested in a pipeline, see for example pipeline_annotations. Hopefully I pushed this correctly, I have the following snipped in my repository:

 logfiles = glob.glob(track + "*.log")
        job_finished = True
        for logfile in logfiles:
            is_complete = IOTools.isComplete(logfile)
            E.debug("logcheck: {} = {}".format(logfile, is_complete))
            job_finished = job_finished and is_complete
sebastian-luna-valero commented 7 years ago

Hi @AndreasHeger

Strange, I don't see new commits the Py3-migration branches yet.

The statement logfiles = glob.glob(track + "*.log"), will return ['test_annotations.log', 'test_annotations.tgz.log'], so you're right and it won't pickup the test_annotations.dir/pipeline.log, which I find necessary to check as well since pipeline_testing.py may finish silently while the pipeline under test may fail, giving exceptions in test_annotations.dir/pipeline.log.

Moreover, I think you can't expect to have # job finished in while running the compareCheckSums task of pipeline_testing.py.

AndreasHeger commented 7 years ago

Thanks, let us talk on Monday.

I think I saw the changes on github, but maybe I put it in the wrong branch?

Best wishes, Andreas

On 30/06/17 09:51, Sebastian Luna-Valero wrote:

Hi @AndreasHeger https://github.com/andreasheger

Strange, I don't see new commits the |Py3-migration| branches yet.

The statement |logfiles = glob.glob(track + "*.log")|, will return |['test_annotations.log', 'test_annotations.tgz.log']|, so you're right and it won't pickup the |test_annotations.dir/pipeline.log|, which I find necessary to check as well since |pipeline_testing.py| may finish silently while the pipeline under test may fail, giving exceptions in |test_annotations.dir/pipeline.log|.

Moreover, I think you can't expect to have |# job finished| in while running the |compareCheckSums| task of |pipeline_testing.py|.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/CGATOxford/cgat/issues/341#issuecomment-312213176, or mute the thread https://github.com/notifications/unsubscribe-auth/AEOKYHJwQ9Rl_7JhxdSJ1Q1S6grSy8WFks5sJLb-gaJpZM4OHA_e.

AndreasHeger commented 7 years ago

apologies, forgot to push changes to the CGAT repository, only CGAT Pipelines.

just pushed!

On 06/30/17 09:51, Sebastian Luna-Valero wrote:

Hi @AndreasHeger https://github.com/andreasheger

Strange, I don't see new commits the |Py3-migration| branches yet.

The statement |logfiles = glob.glob(track + "*.log")|, will return |['test_annotations.log', 'test_annotations.tgz.log']|, so you're right and it won't pickup the |test_annotations.dir/pipeline.log|, which I find necessary to check as well since |pipeline_testing.py| may finish silently while the pipeline under test may fail, giving exceptions in |test_annotations.dir/pipeline.log|.

Moreover, I think you can't expect to have |# job finished| in while running the |compareCheckSums| task of |pipeline_testing.py|.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/CGATOxford/cgat/issues/341#issuecomment-312213176, or mute the thread https://github.com/notifications/unsubscribe-auth/AEOKYHJwQ9Rl_7JhxdSJ1Q1S6grSy8WFks5sJLb-gaJpZM4OHA_e.

sebastian-luna-valero commented 7 years ago

Thanks for fixing @AndreasHeger