cgat-developers / cgat-flow

cgat-flow repository
MIT License
13 stars 9 forks source link

multiple P.run() in one function failing #90

Closed Acribbs closed 5 years ago

Acribbs commented 5 years ago

@AndreasHeger has there been any changes recently in ruffus or cgat-core that would change how P.run() executes statements?

In bamstats I have this function:

@follows(mkdir("nreads.dir"))
@transform("*.bam",
           suffix(".bam"),
           r"nreads.dir/\1.nreads")
def countReads(infile, outfile):
    '''Count number of reads in input files.'''

    statement = '''printf "nreads \\t" >> %(outfile)s'''

    P.run(statement)

    statement = '''samtools view %(infile)s | wc -l | xargs printf >> %(outfile)s'''

    P.run(statement)

However, it gives the following error when trying to evaluate the second P.run():

 Exception #1 \
#                                     'builtins.ZeroDivisionError(float division by zero)' raised in ... \
#                                      Task = def pipeline_bamstats.countReads(...): \
#                                      Job  = [Brain-F1-R1.bam -> nreads.dir/Brain-F1-R1.nreads] \
#                                    \
#                                   Traceback (most recent call last): \
#                                     File "/ifs/devel/adamc/cgat-developers/conda-install/envs/cgat-flow-full/lib/python3.6/site-packages/ruffus/task.py", line 748, in run_pooled_job_without_exceptions \
#                                       register_cleanup, touch_files_only) \
#                                     File "/ifs/devel/adamc/cgat-developers/conda-install/envs/cgat-flow-full/lib/python3.6/site-packages/ruffus/task.py", line 566, in job_wrapper_io_files \
#                                       ret_val = user_defined_work_func(*params) \
#                                     File "/ifs/devel/adamc/cgat-developers/cgat-flow/cgatpipelines/tools/pipeline_bamstats.py", line 241, in countReads \
#                                       P.run(statement) \
#                                     File "/ifs/devel/adamc/cgat-developers/cgat-core/cgatcore/pipeline/execution.py", line 1211, in run \
#                                       benchmark_data = r.run(statement_list) \
#                                     File "/ifs/devel/adamc/cgat-developers/cgat-core/cgatcore/pipeline/execution.py", line 803, in run \
#                                       resource_usage)) \
#                                     File "/ifs/devel/adamc/cgat-developers/cgat-core/cgatcore/pipeline/execution.py", line 703, in collect_benchmark_data \
#                                       100.0 * cpu_time / (max(1.0, end_time) - start_time) / self.job_threads), \
#                                   ZeroDivisionError: float division by zero \

When I rewrite the function to:

@follows(mkdir("nreads.dir"))
@transform("*.bam",
           suffix(".bam"),
           r"nreads.dir/\1.nreads")
def countReads(infile, outfile):
    '''Count number of reads in input files.'''

    statement = '''printf "nreads \\t" >> %(outfile)s && samtools view %(infile)s | wc -l | xargs printf >> %(outfile)s'''

    P.run(statement)

It now works, is this expected behaviour?

Acribbs commented 5 years ago

Hi @AndreasHeger,

Actually this is also happening to intBam function and seems to be related to a commit 3764dec50196a4702bff5fe445b36188481791bb. This line specifically: https://github.com/cgat-developers/cgat-core/blob/master/cgatcore/pipeline/execution.py#L702. Im not entirely sure what is going on, but I will have a further look into this.

Acribbs commented 5 years ago

This was a cgat core issue and has been resolved with this commit https://github.com/cgat-developers/cgat-core/pull/78