galaxyproject / galaxy

Data intensive science for everyone.
https://galaxyproject.org
Other
1.4k stars 1.01k forks source link

_handle_script_integrity slows down queueing of tools #9711

Open mvdbeek opened 4 years ago

mvdbeek commented 4 years ago

_handle_script_integrity is supposed to delay handing over of the job script to the DRM in case Text file busy or other issues occur. Text file busy may occur when the script is still open for writing by a process (and in all probability this should be the current python thread, unless there is a race condition) when attempting to execute the script.

In my profiling _handle_script_integrity takes up about 200 ms of time in queue_job, which is pretty bad if you're trying to launch a thousand jobs, which is not so uncommon these days.

One easy improvement is https://github.com/galaxyproject/galaxy/pull/9691#issuecomment-623595665.

Another thing we could try is something like this:

def check_if_path_opened(path):
    for p in psutil.process_iter(['open_files']):
        for file in p.info['open_files'] or []:
            if file.path == path:
                return True
check_if_path_opened('/Users/mvandenb/src/galaxy/lib/test.sh')

And finally we should check if maybe we have a race condition when this actually happens and try to solve it?

@natefoo, any chance you could count the number of times usegalaxy.org has seen Script integrity error for file since moving to python 3 ?

nsoranzo commented 4 years ago

Text file busy may occur when the script is still open for writing by a process (and in all probability this should be the current python thread, unless there is a race condition) when attempting to execute the script.

If (and it's a big if) the Text file busy issue is at the kernel level as explained in https://stackoverflow.com/questions/52375701/how-to-completely-close-the-file-write-before-starting-another-process-which-use , then the Python code is not at fault.

Another thing we could try is something like this:

def check_if_path_opened(path):
    for p in psutil.process_iter(['open_files']):
        for file in p.info['open_files'] or []:
            if file.path == path:
                return True
check_if_path_opened('/Users/mvandenb/src/galaxy/lib/test.sh')

And finally we should check if maybe we have a race condition when this actually happens and try to solve it?

@natefoo, any chance you could count the number of times usegalaxy.org has seen Script integrity error for file since moving to python 3 ?

In the same hypothesis, none of these 3 approaches would help. But we could remove the subprocess.check_call(INTEGRITY_SYNC_COMMAND) (i.e. /bin/sync) from _handle_script_integrity() , which probably just makes things worse.