galaxyproject / pulsar

Distributed job execution application built for Galaxy
https://pulsar.readthedocs.io
Apache License 2.0
37 stars 50 forks source link

Galaxy job failure: bogus escape: u'\\39' #125

Open edwardsnj opened 7 years ago

edwardsnj commented 7 years ago

The galaxy job submission to pulsar is failing where the job directory on the pulsar side begins with problematic two-digit numbers (like 39) on Windows. My pulsar job id numbers were 39? and all these attempts to submit from galaxy to pulsar failed. Once the job numbers got to 40? the failures stopped.

I think the issue is that the regular expression library is interpreting windows paths with embedded real backslashes incorrectly 'blah\\391\\working' as a special escape sequence (that is not valid). I'm not sure why \\38 and \\40 are more reasonable, but it suggests there is some regular expression escaping that is needed here (or different job directory names).

Traceback (most recent call last):
  File ".../galaxy/lib/galaxy/jobs/runners/pulsar.py", line 289, in queue_job
    job_id = pulsar_submit_job(client, client_job_description, remote_job_config)
  File ".../galaxy/.venv/lib/python2.7/site-packages/pulsar/client/staging/up.py", line 23, in submit_job
    file_stager = FileStager(client, client_job_description, job_config)
  File ".../galaxy/.venv/lib/python2.7/site-packages/pulsar/client/staging/up.py", line 93, in __init__
    self.__initialize_referenced_tool_files()
  File ".../galaxy/.venv/lib/python2.7/site-packages/pulsar/client/staging/up.py", line 149, in __initialize_referenced_tool_files
    for potential_tool_file in self.job_inputs.find_referenced_subfiles(new_tool_directory):
  File ".../galaxy/.venv/lib/python2.7/site-packages/pulsar/client/staging/up.py", line 342, in find_referenced_subfiles
    return self.find_pattern_references(pattern)
  File ".../galaxy/.venv/lib/python2.7/site-packages/pulsar/client/staging/up.py", line 323, in find_pattern_references
    referenced_files.update(findall(pattern, input_contents))
  File ".../galaxy/.venv/lib/python2.7/re.py", line 177, in findall
    return _compile(pattern, flags).findall(string)
  File ".../galaxy/.venv/lib/python2.7/re.py", line 242, in _compile
    raise error, v # invalid expression
error: bogus escape: u'\\39'
edwardsnj commented 7 years ago

Ah, first piece of insight - 39 is not a valid octal number, but 40 is, 38 is probably bad too, but not 37...

edwardsnj commented 7 years ago

Here is my fix. This has fixed the issue I was having. Note that this is on the client (galaxy) side. I suspect there are other places where something like this might be needed...

Here is a patch against 0.7.3.

--- up.py.orig  2016-11-11 16:36:05.000000000 -0500
+++ up.py       2016-11-09 18:51:35.000000000 -0500
@@ -338,6 +338,11 @@
         if directory is None:
             return []

+       # if there is any (single) backslash in the directory
+       # why isn't single backslash r'\' valid python? Argh!
+       if '\\' in directory:
+           # replace each single backslash with a double backslash
+           directory = directory.replace('\\','\\\\')
         pattern = r"(%s%s\S+)" % (directory, sep)
         return self.find_pattern_references(pattern)
williambarshop commented 7 years ago

I have been having this same issue on my Galaxy//Pulsar setup. I have no information beyond what edwardsnj has suggested, but I can certainly replicate the issue!

The stack trace is included below:

`Traceback (most recent call last):

File "/galaxy-central/lib/galaxy/jobs/runners/pulsar.py", line 296, in queue_job

job_id = pulsar_submit_job(client, client_job_description, remote_job_config)

File "/galaxy_venv/local/lib/python2.7/site-packages/pulsar/client/staging/up.py", line 23, in submit_job file_stager = FileStager(client, client_job_description, job_config)

File "/galaxy_venv/local/lib/python2.7/site-packages/pulsar/client/staging/up.py", line 93, in init self.__initialize_referenced_tool_files()

File "/galaxy_venv/local/lib/python2.7/site-packages/pulsar/client/staging/up.py", line 149, in __initialize_referenced_tool_files

for potential_tool_file in self.job_inputs.find_referenced_subfiles(new_tool_directory):

File "/galaxy_venv/local/lib/python2.7/site-packages/pulsar/client/staging/up.py", line 342, in find_referenced_subfiles

return self.find_pattern_references(pattern)

File "/galaxy_venv/local/lib/python2.7/site-packages/pulsar/client/staging/up.py", line 323, in find_pattern_references

referenced_files.update(findall(pattern, input_contents))

File "/galaxy_venv/lib/python2.7/re.py", line 177, in findall return _compile(pattern, flags).findall(string)

File "/galaxy_venv/lib/python2.7/re.py", line 244, in _compile raise error, v # invalid expression

error: bogus escape: u'\37'`