CGATOxford / CGATPipelines

Collection of CGAT NGS Pipelines
MIT License
43 stars 18 forks source link

Pipeline.RuffusLoggingFilter raises error when file paths exceed 85 characters #33

Closed jethrojohnson closed 9 years ago

jethrojohnson commented 9 years ago

Ruffus tasks involving file paths > 85 characters are failing when running 'make'.

It seems that either ruffus's pipeline_printout() or StringIO.getValue() splits lines > 85 characters, meaning the string argument (ruffus_text) passed to RuffusLoggingFilter contains tab separations that are causing split_by_job to fail.

For example, the following task works fine:

@transform("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/dummy.gz",
           regex("(.+)/(.+).gz"),
           r"\1/\2_out.gz")
def shortTask(infile, outfile):
    pass

giving:

Job  = [aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/dummy.gz -> aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/dummy_out.gz]

Whereas:

@transform("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/dummyfile.gz",
           regex("(.+)/(.+).gz"),
           r"\1/\2_out.gz")
def longTask(infile, outfile):
    pass

results in:

'              [aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/dummyfile               .gz             ->               aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/dummyfile_               out.gz]'

causing the following error:

Traceback (most recent call last):
  File "/ifs/devel/projects/proj025/pipeline_p25.py", line 1753, in <module>
    sys.exit( P.main(sys.argv) )
  File "/ifs/devel/jethro/CGATPipelines/CGATPipelines/Pipeline.py", line 2468, in main
    exchange=options.rabbitmq_exchange)
  File "/ifs/devel/jethro/CGATPipelines/CGATPipelines/Pipeline.py", line 2115, in __init__
    for task_name, task_status, jobs in split_by_task(ruffus_text):
  File "/ifs/devel/jethro/CGATPipelines/CGATPipelines/Pipeline.py", line 2098, in split_by_task
    yield task_name, task_status, list(split_by_job(block))
  File "/ifs/devel/jethro/CGATPipelines/CGATPipelines/Pipeline.py", line 2073, in split_by_job
    raise AttributeError("could not parse '%s'" % line)

This issue can be fixed by prepending the regex in split_by_job() with \s*. However, I don't know if it's part of a wider problem

AndreasHeger commented 9 years ago

Thanks Jethro, that should be the only instance where this is a problem.

I will create a fix and pull request.

AndreasHeger commented 9 years ago

Please see #34, does it work?

jethrojohnson commented 9 years ago

Thanks Andreas - the above example (longTask) works fine on branch AH-FixForIssue33.

AndreasHeger commented 9 years ago

Thanks, closed!