galaxyproject / pulsar

Distributed job execution application built for Galaxy
https://pulsar.readthedocs.io
Apache License 2.0
37 stars 50 forks source link

Admin WG - Collected Pulsar Issue(s) #240

Closed hexylena closed 3 years ago

hexylena commented 3 years ago
gmauro commented 3 years ago

Fix for Pulsar to transfer extra files in the DESeq2 wrapper - https://github.com/galaxyproject/tools-iuc/pull/3420/files

natefoo commented 3 years ago

Pulsar seems to be broken with globs in from_work_dir outputs: galaxyproject/pulsar#239

Slugger70 commented 3 years ago

Blast wrappers don't work on custom dbs. db file is transferred but the indices for the db are not.. I've found that unless files are explicity called in the commandline, they are not transferred.

cat-bro commented 3 years ago

There is an infrequent uncaught error transferring files from galaxy to pulsar:

Feb 24 02:19:36 pulsar-mel3 pulsar[2763306]: 2021-02-24 02:19:36,847 ERROR [pulsar.managers.stateful][[manager=_default_]-[action=preprocess]-[job=2101382]] Failed job preprocessing for job 2101382:
Feb 24 02:19:36 pulsar-mel3 pulsar[2763306]: Traceback (most recent call last):
Feb 24 02:19:36 pulsar-mel3 pulsar[2763306]:   File "/mnt/pulsar/venv/lib/python3.8/site-packages/pulsar/managers/stateful.py", line 120, in _handling_of_preprocessing_state
Feb 24 02:19:36 pulsar-mel3 pulsar[2763306]:     yield
Feb 24 02:19:36 pulsar-mel3 pulsar[2763306]:   File "/mnt/pulsar/venv/lib/python3.8/site-packages/pulsar/managers/stateful.py", line 111, in do_preprocess
Feb 24 02:19:36 pulsar-mel3 pulsar[2763306]:     preprocess(job_directory, setup_config, self.__preprocess_action_executor, object_store=self.object_store)
Feb 24 02:19:36 pulsar-mel3 pulsar[2763306]:   File "/mnt/pulsar/venv/lib/python3.8/site-packages/pulsar/managers/staging/pre.py", line 19, in preprocess
Feb 24 02:19:36 pulsar-mel3 pulsar[2763306]:     action_executor.execute(lambda: action.write_to_path(path), "action[%s]" % description)
Feb 24 02:19:36 pulsar-mel3 pulsar[2763306]:   File "/mnt/pulsar/venv/lib/python3.8/site-packages/pulsar/managers/util/retry.py", line 42, in execute
Feb 24 02:19:36 pulsar-mel3 pulsar[2763306]:     return _retry_over_time(
Feb 24 02:19:36 pulsar-mel3 pulsar[2763306]:   File "/mnt/pulsar/venv/lib/python3.8/site-packages/pulsar/managers/util/retry.py", line 93, in _retry_over_time
Feb 24 02:19:36 pulsar-mel3 pulsar[2763306]:     return fun(*args, **kwargs)
Feb 24 02:19:36 pulsar-mel3 pulsar[2763306]:   File "/mnt/pulsar/venv/lib/python3.8/site-packages/pulsar/managers/staging/pre.py", line 19, in <lambda>
Feb 24 02:19:36 pulsar-mel3 pulsar[2763306]:     action_executor.execute(lambda: action.write_to_path(path), "action[%s]" % description)
Feb 24 02:19:36 pulsar-mel3 pulsar[2763306]:   File "/mnt/pulsar/venv/lib/python3.8/site-packages/pulsar/client/action_mapper.py", line 465, in write_to_path
Feb 24 02:19:36 pulsar-mel3 pulsar[2763306]:     get_file(self.url, path)
Feb 24 02:19:36 pulsar-mel3 pulsar[2763306]:   File "/mnt/pulsar/venv/lib/python3.8/site-packages/pulsar/client/transport/curl.py", line 93, in get_file
Feb 24 02:19:36 pulsar-mel3 pulsar[2763306]:     c.perform()
Feb 24 02:19:37 pulsar-mel3 pulsar[2763306]: pycurl.error: (18, 'transfer closed with 1355529374 bytes remaining to read')

This is not predictable based on the inputs, often the user will submit a new job with the same inputs without this error occurring.

natefoo commented 3 years ago

@cat-bro You can tell Pulsar to retry interrupted transfers like this, if you aren't using this give it a shot and see if it helps.

Slugger70 commented 3 years ago

Another issue I've come across a number of times: When Galaxy runs a job locally - it will create all of the expected output files (empty) at the beginning of the job. Pulsar however doesn't do this and sometimes, tools will fail for whatever reason and pulsar can't find the output files to transfer back. This is especially the case where users can select the outputs they want and if the tool doesn't provide it then... In Galaxy it's ok as we have an empty file.

Slugger70 commented 3 years ago

Adding to the above issue - where does output filtering occur in Galaxy? After the job has completed? Or does Galaxy create all the output files and then filter them at the end or does it create only the ones expected by the filters? Pulsar seems to want to send back to Galaxy files that don't exist as they would normally be filtered out (using output filters).

Slugger70 commented 3 years ago

From_working_dir Glob/wildcard in from_work_dir outputs doesn't work #239 and outputs_to_job_dir doesn’t work Pulsar does not find collection files #212 - FIXED with PR #257

Slugger70 commented 3 years ago

No stderr/out viewable, users cannot see logs when tools crash Missing stderr/stdout on pulsar jobs #211 - FIXED with PR #258

natefoo commented 3 years ago

Hey @Slugger70 did you (or maybe @cat-bro) verify that collection outputs are valid and not just green and empty? I'm not sure how #257 or #258 fixes #212, but I can imagine how #258 might cause green empty datasets. Although I don't know why collection outputs work so who knows.

natefoo commented 3 years ago

Another issue I've come across a number of times: When Galaxy runs a job locally - it will create all of the expected output files (empty) at the beginning of the job. Pulsar however doesn't do this and sometimes, tools will fail for whatever reason and pulsar can't find the output files to transfer back. This is especially the case where users can select the outputs they want and if the tool doesn't provide it then... In Galaxy it's ok as we have an empty file.

For the record this should be fixed by #257 for from_work_dir outputs. I just did a quick check and, without any of the new PRs, Pulsar doesn't precreate defined, non-from_work_dir outputs, but it also doesn't force a job failure if the tool fails to create any of the defined outputs, that only happened with from_work_dir ones.

natefoo commented 3 years ago

Remaining issues are on the Admin project board so I don't think we need to keep this growing/mutating issue open indefinitely as that is what the project board is for.