galaxyproject / pulsar

Distributed job execution application built for Galaxy
https://pulsar.readthedocs.io
Apache License 2.0
37 stars 50 forks source link

Files are not copied to Galaxy (sometimes) #317

Closed mira-miracoli closed 1 year ago

mira-miracoli commented 1 year ago

I observe mixed behaviour in my Pulsar endpoint:

Currently I am unable to spot any pattern here or any reliable setting that works. I will try to observe changing parts around it more. What adds up to it might be that some tools are not working as expected, e.g. ENAsearch is producing wrong output for xml. And Galaxy Handlers are not picking up status messages sometimes, which is fixed by a handler restart

OS

RockyLinux 9.1

Pulsar

pulsar-app==0.14.13

Galaxy

23.0europe

mvdbeek commented 1 year ago

0.14.13 is more than a year old (see https://github.com/galaxyproject/pulsar/compare/0.14.13...0.15.0.dev1 for changes). I would start by updating pulsar, then collect which tools produce errors in which category. We don't need data for all tools, a couple of examples should be sufficient. For each failing tool it'd then also be great to know the pulsar settings, the object store config as well as the relevant job destination parameters (extended metadata, remote metadata etc).

mvdbeek commented 1 year ago

And Galaxy Handlers are not picking up status messages sometimes, which is fixed by a handler restart

when that happens can you produce a stack dump with py-spy for the handler that is supposed to process messages ?

mira-miracoli commented 1 year ago

Thank you @mvdbeek I will try to write everything down and update the VGCN Images to 0.14.15

when that happens can you produce a stack dump with py-spy for the handler that is supposed to process messages

Yes I can do that

mtangaro commented 1 year ago

Hi all, I’m trying to find the working configuration of our pulsar endpoint with usegalaxy.eu. Unfortunately, I’m also experiencing multiple behaviours, and I’m unable to identify the problems.

Run simple fastqc job.

I see a different behaviour for mulled containers. For instance, a bowtie2 job, paired-end data, saccer3 reference data from history.

Finally, If I set the "job home directory as $HOME", the issue is there:

pulsar@vgcn-exec-node-1-usegalaxy-eu:/data/share/staging/58490602$ export HOME=/data/share/staging/58490602/home
pulsar@vgcn-exec-node-1-usegalaxy-eu:/data/share/staging/58490602$ echo $HOME
/data/share/staging/58490602/home
pulsar@vgcn-exec-node-1-usegalaxy-eu:/data/share/staging/58490602$ cd working/
pulsar@vgcn-exec-node-1-usegalaxy-eu:/data/share/staging/58490602/working$ singularity -v -s exec --cleanenv -B /data/share/staging/58490602:/data/share/staging/58490602 -B /data/share/staging/58490602/tool_files:/data/share/staging/58490602/tool_files:ro -B /data/share/staging/58490602/outputs:/data/share/staging/58490602/outputs -B /data/share/staging/58490602/working:/data/share/staging/58490602/working --home /data/share/staging/58490602/home:/data/share/staging/58490602/home docker://quay.io/biocontainers/mulled-v2-c742dccc9d8fabfcff2af0d8d6799dbc711366cf:e292900a5ccb65f879af393340758539ef14f345-0 /bin/bash /data/share/staging/58490602/tool_script.sh
INFO:    Converting OCI blobs to SIF format
INFO:    Starting build...
Getting image source signatures

Pulsar configuration details:

Pulsar Endpoint configuration on usegalaxy.eu: https://github.com/usegalaxy-eu/infrastructure-playbook/blob/7d0cfba99562e8ec5eb99c9939d71b9d91f00e43/files/galaxy/dynamic_rules/usegalaxy/destination_specifications.yaml#L737

Pulsar app,yml: https://gist.githubusercontent.com/mtangaro/1ffc7b6f07ba255ac72398401cc69baf/raw/d8192e8b3a56aa60a169436e8c0743a935c56870/gistfile1.txt

local_env.sh: no configurations

server.ini: https://gist.githubusercontent.com/mtangaro/eefbcac22be194fb879e88e279a45a48/raw/00e9488fd81d8b443a16e54e58248d281d16660a/gistfile1.txt

OS: RockyLinux 9

Pulsar: 0.14.15

mvdbeek commented 1 year ago

How did you both end up with 0.14.15 ? That version was only current for a single day and had a major bug

mtangaro commented 1 year ago

How did you both end up with 0.14.15 ? That version was only current for a single day and had a major bug

Because for EuroScienceGateway we are both using the same image with the same pulsar version. Do you recommend 0.14.6 or 0.15.0dev1 (or both in two different endpoint)?

mira-miracoli commented 1 year ago

I haven't seen any release notes, so I didn't know there was a bug. Which version do you recommend? 0.14.15 or one of the dev versions?

mvdbeek commented 1 year ago

Release notes are here and here, but even without the notes, why not go with the current stable version ? The pre-release version updates the k8s coexecution strategy to also work for tes for and adds a new option for amqp routes: https://github.com/galaxyproject/pulsar/compare/0.14.16...0.15.0.dev1#diff-e2de3cbc57ab9b097fca7d7c444c094ba3c1d4ee92b36d972d0d9d49669436cc

hexylena commented 1 year ago

and had a major bug

should we update the changelog then? it only notes "small regressions bugs"

mvdbeek commented 1 year ago

it's a small regression. if we consider this bad we can yank the release, but how did anyone end up with a release that was less than 24h old ?

mvdbeek commented 1 year ago

is this some "let's not use the latest version" thinking ? please don't do that, most of what's being added in patch releases are bug fixes

mtangaro commented 1 year ago

Hi Marius, again I'm very sorry for using the wrong Pulsar version. The update to 0.14.16 version solved the upload issue and fastqc is finally working (again).

On the contrary the issue with bowtie2 is still there, with the job still stuck. The pulsar log: https://gist.githubusercontent.com/mtangaro/c5b24618beecbad0ab8be51169660d00/raw/e61408ca90a3e847d3d557019f65f2c0c0247528/gistfile1.txt And the process in the exec node:

pulsar    140719  1.1  0.1 1562864 45920 ?       Sl   16:10   0:00 singularity -s exec --cleanenv -B /data/share/staging/58540489:/data/share/staging/58540489 -B /data/share/staging/58540489/tool_files:/data/share/staging/58540489/tool_files:ro -B /data/share/staging/58540489/outputs:/data/share/staging/58540489/outputs -B /data/share/staging/58540489/working:/data/share/staging/58540489/working --home /data/share/staging/58540489/home:/data/share/staging/58540489/home docker://quay.io/biocontainers/mulled-v2-c742dccc9d8fabfcff2af0d8d6799dbc711366cf:e292900a5ccb65f879af393340758539ef14f345-0 /bin/bash /data/share/staging/58540489/tool_script.sh

The singularity container is still not running:

$ singularity instance list
INSTANCE NAME    PID    IP    IMAGE

Do you have any idea?

mvdbeek commented 1 year ago

That's a different issue, and probably not related to pulsar. That said:

Environment variable SINGULARITY_CACHEDIR is set, but APPTAINER_CACHEDIR is preferred

does the user that runs the job have that variable set ? If that's needed I would set it in your job destination

mvdbeek commented 1 year ago

I also see WARNING: Cache disabled - cache location /opt/pulsar is not writable. with the manual run, not sure which way you want to go with this, but the cache should be on a fast local disk.

bgruening commented 1 year ago

Ok, a bit of progress. With the latest +dev1 version the None in the path are gone. A few tools are now working. More complicated tools like bowtie2 and star return in the Galaxy UI Job 58583504's output dataset(s) could not be read.

This above error is with ouputs_to_workingdir. If I disable to outputs_to_workingdir setting I get this traceback:

Apr 06 17:54:14 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[239718]: Traceback (most recent call last):
Apr 06 17:54:14 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[239718]:   File "/opt/pulsar/venv3/lib64/python3.9/site-packages/pulsar/client/staging/down.py", line 93, in __collect_working_directory_outputs
Apr 06 17:54:14 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[239718]:     self.output_files.remove(output_file)
Apr 06 17:54:14 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[239718]: ValueError: list.remove(x): x not in list
Apr 06 17:54:14 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[239718]: During handling of the above exception, another exception occurred:
Apr 06 17:54:14 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[239718]: Traceback (most recent call last):
Apr 06 17:54:14 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[239718]:   File "/opt/pulsar/venv3/lib64/python3.9/site-packages/pulsar/managers/stateful.py", line 223, in do_postprocess
Apr 06 17:54:14 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[239718]:     postprocess_success = postprocess(job_directory, self.__postprocess_action_executor)
Apr 06 17:54:14 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[239718]:   File "/opt/pulsar/venv3/lib64/python3.9/site-packages/pulsar/managers/staging/post.py", line 23, in postprocess
Apr 06 17:54:14 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[239718]:     collected = __collect_outputs(job_directory, staging_config, action_executor)
Apr 06 17:54:14 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[239718]:   File "/opt/pulsar/venv3/lib64/python3.9/site-packages/pulsar/managers/staging/post.py", line 38, in __collect_outputs
Apr 06 17:54:14 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[239718]:     collection_failure_exceptions = results_collector.collect()
Apr 06 17:54:14 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[239718]:   File "/opt/pulsar/venv3/lib64/python3.9/site-packages/pulsar/client/staging/down.py", line 69, in collect
Apr 06 17:54:14 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[239718]:     self.__collect_working_directory_outputs()
Apr 06 17:54:14 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[239718]:   File "/opt/pulsar/venv3/lib64/python3.9/site-packages/pulsar/client/staging/down.py", line 95, in __collect_working_directory_outputs
Apr 06 17:54:14 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[239718]:     raise Exception("Failed to remove {} from {}".format(output_file, self.output_files))
Apr 06 17:54:14 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[239718]: Exception: Failed to remove None from ['/data/dnb08/galaxy_db/files/1/9/5/dataset_1951953a-c2dc-42d9-ae6e-82ac5a4684e2.dat', '/data/dnb08/galaxy_db/files/a/8/d/dataset_a8dbe145-63a8-48d3-8106-4beeeb0721cc.dat', '/data/dnb08/galaxy_db/files/3/4/e/dataset_34e6f145-8543-4746-bf58-abc5f760a5e1.dat']

During debugging I found that we are not setting --pwd in Singularity, which I think we be good, to force a working dir.

I can not prove it, but I have the feeling that handlers are forgetting pulsar jobs from time to time. I try to understand this, but what out for it if you debug pulsar.

mvdbeek commented 1 year ago

This above error is with ouputs_to_workingdir.

that is not a setting that makes sense or is compatible with pulsar, see https://github.com/galaxyproject/pulsar/issues/193#issuecomment-1106377936

During debugging I found that we are not setting --pwd in Singularity, which I think we be good, to force a working dir.

that's not needed for .org and we are running pulsar with singularity (correct me if I'm wrong @natefoo)

I'll take a look at the traceback, it looks like something I already fixed in a different spot of the codebase. Do you know what tool this was ?

bgruening commented 1 year ago

This above error is with ouputs_to_workingdir.

that is not a setting that makes sense or is compatible with pulsar, see https://github.com/galaxyproject/pulsar/issues/193#issuecomment-1106377936

Yes, I know, but if this is part of our default destinations (non-pulsar settings) I think Pulsar should ignore it. But it seems it actually does something with it.

During debugging I found that we are not setting --pwd in Singularity, which I think we be good, to force a working dir.

that's not needed for .org and we are running pulsar with singularity (correct me if I'm wrong @natefoo) Its not needed, just a convenience for admins and more explicit. Ignore it, was just an observation. I'll take a look at the traceback, it looks like something I already fixed in a different spot of the codebase. Do you know what tool this was ?

FASTQC and Star. Funnily, FASTQC works with outputs_to_working_directory=true and turns green. But crashes with outputs_to_working_directory=false

You have on EU under user preference an option where you can choose your destinations:

grafik

bgruening commented 1 year ago

(sorry edited your comment instead of replying)

mvdbeek commented 1 year ago

But crashes with outputs_to_working_directory=false

that is probably fixed by https://github.com/galaxyproject/galaxy/pull/15918/commits/ff8835b75771a79922faa209cb65c2c192f015ad ... at least it was flagged in the framework tests

bgruening commented 1 year ago

Thanks @mvdbeek I'm happy to test this on EU as soon as you think it's ready.

bgruening commented 1 year ago

Thanks @mvdbeek! I have updated Galaxy to the latest 23.0 commit and the pulsar server to dev2.

Star fails with the None error again.

2678]] Failed to execute staging out file /data/share/staging/58682678/working/Log.final.out via FileAction[path=None,action_type=remote_transfer,url=https://usegalaxy.eu/_job_files?job_id=11ac94870d0bb33a2ff31a8386e3421c&job_key=b269333ccac1e9147600fa818dc8aa4a&path=None&file_type=output_workdir], retrying in 2.0 seconds.
Apr 10 23:24:31 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[464405]: Traceback (most recent call last):
Apr 10 23:24:31 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[464405]:   File "/opt/pulsar/venv3/lib64/python3.9/site-packages/pulsar/managers/util/retry.py", line 93, in _retry_over_time
Apr 10 23:24:31 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[464405]:     return fun(*args, **kwargs)
Apr 10 23:24:31 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[464405]:   File "/opt/pulsar/venv3/lib64/python3.9/site-packages/pulsar/managers/staging/post.py", line 82, in <lambda>
Apr 10 23:24:31 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[464405]:     self.action_executor.execute(lambda: action.write_from_path(pulsar_path), description)
Apr 10 23:24:31 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[464405]:   File "/opt/pulsar/venv3/lib64/python3.9/site-packages/pulsar/client/action_mapper.py", line 482, in write_from_path
Apr 10 23:24:31 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[464405]:     post_file(self.url, pulsar_path)
Apr 10 23:24:31 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[464405]:   File "/opt/pulsar/venv3/lib64/python3.9/site-packages/pulsar/client/transport/curl.py", line 77, in post_file
Apr 10 23:24:31 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[464405]:     raise Exception(message)
Apr 10 23:24:31 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[464405]: Exception: Failed to post_file properly for url https://usegalaxy.eu/_job_files?job_id=11ac94870d0bb33a2ff31a8386e3421c&job_key=b269333ccac1e9147600fa818dc8aa4a&path=None&file_type=output_workdir, remote server returned status code of 500.
Apr 10 23:24:36 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[464405]: 2023-04-10 23:24:36,852 INFO  [pulsar.managers.util.retry][[manager=production]-[action=postprocess]-[job=58682678]] Failed to execute staging out file /data/share/staging/58682678/working/Log.final.out via FileAction[path=None,action_type=remote_transfer,url=https://usegalaxy.eu/_job_files?job_id=11ac94870d0bb33a2ff31a8386e3421c&job_key=b269333ccac1e9147600fa818dc8aa4a&path=None&file_type=output_workdir], retrying in 4.0 seconds.

Fastqc fails similarly with:


Apr 10 23:28:34 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[464405]: 2023-04-10 23:28:34,416 INFO  [pulsar.managers.util.retry][[manager=production]-[action=postprocess]-[job=58682692]] Failed to execute staging out file /data/share/staging/58682692/working/output.html via FileAction[path=None,action_type=remote_transfer,url=https://usegalaxy.eu/_job_files?job_id=11ac94870d0bb33a1f919eeaac1d4c23&job_key=b269333ccac1e91404917e3d08c2ca32&path=None&file_type=output_workdir], retrying in 2.0 seconds.
Apr 10 23:28:34 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[464405]: Traceback (most recent call last):
Apr 10 23:28:34 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[464405]:   File "/opt/pulsar/venv3/lib64/python3.9/site-packages/pulsar/managers/util/retry.py", line 93, in _retry_over_time
Apr 10 23:28:34 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[464405]:     return fun(*args, **kwargs)
Apr 10 23:28:34 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[464405]:   File "/opt/pulsar/venv3/lib64/python3.9/site-packages/pulsar/managers/staging/post.py", line 82, in <lambda>
Apr 10 23:28:34 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[464405]:     self.action_executor.execute(lambda: action.write_from_path(pulsar_path), description)
Apr 10 23:28:34 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[464405]:   File "/opt/pulsar/venv3/lib64/python3.9/site-packages/pulsar/client/action_mapper.py", line 482, in write_from_path
Apr 10 23:28:34 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[464405]:     post_file(self.url, pulsar_path)
Apr 10 23:28:34 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[464405]:   File "/opt/pulsar/venv3/lib64/python3.9/site-packages/pulsar/client/transport/curl.py", line 77, in post_file
Apr 10 23:28:34 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[464405]:     raise Exception(message)
Apr 10 23:28:34 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[464405]: Exception: Failed to post_file properly for url https://usegalaxy.eu/_job_files?job_id=11ac94870d0bb33a1f919eeaac1d4c23&job_key=b269333ccac1e91404917e3d08c2ca32&path=None&file_type=output_workdir, remote server returned status code of 500.
``
mvdbeek commented 1 year ago

it'd be amazing if we could hook up pulsar to sentry ... i wonder why we don't have a path

bgruening commented 1 year ago

Is that a feature or deployment wish? ;)

mvdbeek commented 1 year ago

I started work on this 😆

mvdbeek commented 1 year ago

hmm, I can't get it to fail ... which is good and bad 😆. That's still fastqc with https://github.com/usegalaxy-eu/infrastructure-playbook/blob/fc5d2f3438ed6edbc8f8d26e5b96056c1c3e3cd2/files/galaxy/dynamic_rules/usegalaxy/destination_specifications.yaml#L737 ?

bgruening commented 1 year ago

The one above. With workingdir = false. You can choose that in your user preferences on EU

mvdbeek commented 1 year ago

It actually all works for me, with or without outputs_to_working_directory, super weird. I'm gonna add the sentry integration, hoping the local variables will help us figure out what's going on.

bgruening commented 1 year ago

Let me know if I can do anything to debug this further. As you know we have celery enabled and metadata_stradegy=extended.

mvdbeek commented 1 year ago

I got access to one of the pulsar nodes, and I see it fail there. deploying some debugging now.

mvdbeek commented 1 year ago

Hmm, I think my best guess at this point is that in https://github.com/mvdbeek/galaxy/blob/629655d300c4da86ede946a733d3b41f16fb37d0/lib/galaxy/jobs/runners/__init__.py#L337 for some reason you get true for the destination, but the false_path is still None. I have no idea how that would happen ...

mvdbeek commented 1 year ago

Could you try https://github.com/galaxyproject/galaxy/commit/fd80dde17346cd45d465326a7f2a52d358464692? It's possible that the way we build the JobIO instances we may have a reference to the global outputs_to_working_directory setting instead of the one specified in the job destination.

bgruening commented 1 year ago

I started something similar before:

Apr 11 23:01:56 sn06.galaxyproject.eu python[135946]: galaxy.jobs.runners WARNING 2023-04-11 23:01:56,878 [pN:handler_sn06_4,p:135946,tN:PulsarJobRunner.work_thread-1] BAG-log: /data/dnb08/galaxy_db/files/5/9/b/dataset_59b363e1-b397-4634-9bd1-c64c4a536d6b.dat Apr 11 23:01:56 sn06.galaxyproject.eu python[135946]: galaxy.jobs.runners WARNING 2023-04-11 23:01:56,878 [pN:handler_sn06_4,p:135946,tN:PulsarJobRunner.work_thread-1] BAG-log: In outputs_to_woking_directory None Apr 11 23:01:56 sn06.galaxyproject.eu python[135946]: galaxy.jobs.runners WARNING 2023-04-11 23:01:56,878 [pN:handler_sn06_4,p:135946,tN:PulsarJobRunner.work_thread-1] BAG-log: /data/dnb08/galaxy_db/files/6/9/a/dataset_69ad2920-743a-490d-a5cd-9cb897065b57.dat Apr 11 23:01:56 sn06.galaxyproject.eu python[135946]: galaxy.jobs.runners WARNING 2023-04-11 23:01:56,879 [pN:handler_sn06_4,p:135946,tN:PulsarJobRunner.work_thread-1] BAG-log: In outputs_to_woking_directory None

grafik

Will run your patch next.

bgruening commented 1 year ago

The path is always true.

mvdbeek commented 1 year ago

real_path is always ser, but false_path is only being set up if outputs_to_working_directory is true. Looks like there is a mismatch somewhere in the life cycle of the job wrapper instance ... very weird. If you have the logs from https://github.com/galaxyproject/galaxy/commit/fd80dde17346cd45d465326a7f2a52d358464692 maybe we can figure this out based on the job destinations

bgruening commented 1 year ago

Here we go!

Apr 11 23:41:16 sn06.galaxyproject.eu python[1296762]: galaxy.jobs.runners WARNING 2023-04-11 23:41:16,538 [pN:handler_sn06_0,p:1296762,tN:PulsarJobRunner.work_thread-2] BAG-log: /data/dnb08/galaxy_db/files/d/5/1/dataset_d51571cd-1700-4dc4-965c-889d254fba94.dat
Apr 11 23:41:16 sn06.galaxyproject.eu python[1296762]: galaxy.jobs.runners ERROR 2023-04-11 23:41:16,538 [pN:handler_sn06_0,p:1296762,tN:PulsarJobRunner.work_thread-2] BAG-log: dataset_path.false_path not populated, but outputs to working directory is True. job destination params: {"priority": "-128", "submit_request_cpus": "8", "submit_request_memory": "4.0G", "jobs_directory": "/data/share/staging", "default_file_action": "remote_transfer", "dependency_resolution": "none", "outputs_to_working_directory": "False", "rewrite_parameters": "True", "transport": "curl", "singularity_enabled": "True", "singularity_default_container_id": "/cvmfs/singularity.galaxyproject.org/u/b/ubuntu:18.04", "singularity_volumes": "$job_directory:rw,$tool_directory:ro,$job_directory/outputs:rw,$working_directory:rw", "accounting_group_user": "55103", "description": "fastqc"}, JobIO.outputs_to_working_directory: False
Apr 11 23:41:16 sn06.galaxyproject.eu python[1296762]: galaxy.jobs.runners WARNING 2023-04-11 23:41:16,549 [pN:handler_sn06_0,p:1296762,tN:PulsarJobRunner.work_thread-2] BAG-log: In outputs_to_woking_directory None
Apr 11 23:41:16 sn06.galaxyproject.eu python[1296762]: galaxy.jobs.runners WARNING 2023-04-11 23:41:16,549 [pN:handler_sn06_0,p:1296762,tN:PulsarJobRunner.work_thread-2] BAG-log: /data/dnb08/galaxy_db/files/7/8/5/dataset_785bd3ed-bf05-445c-a96b-a71a5c0788a3.dat
Apr 11 23:41:16 sn06.galaxyproject.eu python[1296762]: galaxy.jobs.runners ERROR 2023-04-11 23:41:16,550 [pN:handler_sn06_0,p:1296762,tN:PulsarJobRunner.work_thread-2] BAG-log: dataset_path.false_path not populated, but outputs to working directory is True. job destination params: {"priority": "-128", "submit_request_cpus": "8", "submit_request_memory": "4.0G", "jobs_directory": "/data/share/staging", "default_file_action": "remote_transfer", "dependency_resolution": "none", "outputs_to_working_directory": "False", "rewrite_parameters": "True", "transport": "curl", "singularity_enabled": "True", "singularity_default_container_id": "/cvmfs/singularity.galaxyproject.org/u/b/ubuntu:18.04", "singularity_volumes": "$job_directory:rw,$tool_directory:ro,$job_directory/outputs:rw,$working_directory:rw", "accounting_group_user": "55103", "description": "fastqc"}, JobIO.outputs_to_working_directory: False
Apr 11 23:41:16 sn06.galaxyproject.eu python[1296762]: galaxy.jobs.runners WARNING 2023-04-11 23:41:16,566 [pN:handler_sn06_0,p:1296762,tN:PulsarJobRunner.work_thread-2] BAG-log: In outputs_to_woking_directory None
Apr 11 23:41:30 sn06.galaxyproject.eu python[1297074]: galaxy.jobs.runners WARNING 2023-04-11 23:41:30,367 [pN:handler_sn06_1,p:1297074,tN:PulsarJobRunner.work_thread-0] BAG-log: /data/dnb08/galaxy_db/files/5/8/4/dataset_58400a1f-b6f3-4619-8f74-1d5d8e58aeb6.dat
Apr 11 23:41:30 sn06.galaxyproject.eu python[1297074]: galaxy.jobs.runners WARNING 2023-04-11 23:41:30,367 [pN:handler_sn06_1,p:1297074,tN:PulsarJobRunner.work_thread-0] BAG-log: /data/dnb08/galaxy_db/files/4/d/a/dataset_4da03c7d-fc61-40c7-a201-bff7f096091c.dat
Apr 11 23:41:30 sn06.galaxyproject.eu python[1297070]: galaxy.jobs.runners WARNING 2023-04-11 23:41:30,553 [pN:handler_sn06_5,p:1297070,tN:PulsarJobRunner.work_thread-0] BAG-log: /data/dnb08/galaxy_db/files/d/f/3/dataset_df34a12f-329a-480c-aed3-c3208fcd2a77.dat
Apr 11 23:41:30 sn06.galaxyproject.eu python[1297070]: galaxy.jobs.runners WARNING 2023-04-11 23:41:30,554 [pN:handler_sn06_5,p:1297070,tN:PulsarJobRunner.work_thread-0] BAG-log: /data/dnb08/galaxy_db/files/2/c/5/dataset_2c53bc72-b54a-4e71-b471-265bfc4021a1.dat
Apr 11 23:41:33 sn06.galaxyproject.eu python[1297074]: galaxy.jobs.runners WARNING 2023-04-11 23:41:33,814 [pN:handler_sn06_1,p:1297074,tN:PulsarJobRunner.work_thread-1] BAG-log: /data/dnb08/galaxy_db/files/6/3/9/dataset_639d2300-ae74-4d7b-8a8c-ae7f24d13b3e.dat
Apr 11 23:41:33 sn06.galaxyproject.eu python[1297074]: galaxy.jobs.runners WARNING 2023-04-11 23:41:33,815 [pN:handler_sn06_1,p:1297074,tN:PulsarJobRunner.work_thread-1] BAG-log: /data/dnb08/galaxy_db/files/1/9/2/dataset_19240ca1-5ef0-4dc5-9bb7-de98a3e76a21.dat
Apr 11 23:41:33 sn06.galaxyproject.eu python[1297070]: galaxy.jobs.runners WARNING 2023-04-11 23:41:33,987 [pN:handler_sn06_5,p:1297070,tN:PulsarJobRunner.work_thread-3] BAG-log: /data/dnb08/galaxy_db/files/6/0/e/dataset_60e55f05-f12d-47d2-b667-4eeee99e29ec.dat
Apr 11 23:41:33 sn06.galaxyproject.eu python[1297070]: galaxy.jobs.runners WARNING 2023-04-11 23:41:33,988 [pN:handler_sn06_5,p:1297070,tN:PulsarJobRunner.work_thread-3] BAG-log: /data/dnb08/galaxy_db/files/f/0/9/dataset_f097ab15-4a32-4796-bc4d-87729969e511.dat
Apr 11 23:41:35 sn06.galaxyproject.eu python[1297074]: galaxy.jobs.runners WARNING 2023-04-11 23:41:35,612 [pN:handler_sn06_1,p:1297074,tN:PulsarJobRunner.work_thread-3] BAG-log: /data/dnb08/galaxy_db/files/4/7/a/dataset_47ab0e9a-bea5-42f2-b585-47b24bc77e61.dat
Apr 11 23:41:35 sn06.galaxyproject.eu python[1297074]: galaxy.jobs.runners WARNING 2023-04-11 23:41:35,612 [pN:handler_sn06_1,p:1297074,tN:PulsarJobRunner.work_thread-3] BAG-log: /data/dnb08/galaxy_db/files/9/8/d/dataset_98da1401-6931-4b03-997b-145bb40d97bc.dat
Apr 11 23:41:35 sn06.galaxyproject.eu python[1297070]: galaxy.jobs.runners WARNING 2023-04-11 23:41:35,888 [pN:handler_sn06_5,p:1297070,tN:PulsarJobRunner.work_thread-2] BAG-log: /data/dnb08/galaxy_db/files/5/0/c/dataset_50c21c12-f0b6-4500-a1a7-c432d996132b.dat
Apr 11 23:41:35 sn06.galaxyproject.eu python[1297070]: galaxy.jobs.runners WARNING 2023-04-11 23:41:35,888 [pN:handler_sn06_5,p:1297070,tN:PulsarJobRunner.work_thread-2] BAG-log: /data/dnb08/galaxy_db/files/e/e/f/dataset_eefe6594-6513-45d8-8af7-6fa8335fa317.dat
Apr 11 23:41:36 sn06.galaxyproject.eu python[1297074]: galaxy.jobs.runners WARNING 2023-04-11 23:41:36,857 [pN:handler_sn06_1,p:1297074,tN:PulsarJobRunner.work_thread-2] BAG-log: /data/dnb08/galaxy_db/files/1/a/a/dataset_1aaf4c77-6e27-4e04-b014-0135234bc82c.dat
Apr 11 23:41:36 sn06.galaxyproject.eu python[1297074]: galaxy.jobs.runners WARNING 2023-04-11 23:41:36,858 [pN:handler_sn06_1,p:1297074,tN:PulsarJobRunner.work_thread-2] BAG-log: /data/dnb08/galaxy_db/files/d/3/0/dataset_d30e64f1-bf6c-48cf-9acb-bb7d8e5381d8.dat
Apr 11 23:41:37 sn06.galaxyproject.eu python[1297070]: galaxy.jobs.runners WARNING 2023-04-11 23:41:37,545 [pN:handler_sn06_5,p:1297070,tN:PulsarJobRunner.work_thread-1] BAG-log: /data/dnb08/galaxy_db/files/4/4/3/dataset_443b5639-3e80-47b5-a5d9-2f2beb9de2b5.dat
Apr 11 23:41:37 sn06.galaxyproject.eu python[1297070]: galaxy.jobs.runners WARNING 2023-04-11 23:41:37,545 [pN:handler_sn06_5,p:1297070,tN:PulsarJobRunner.work_thread-1] BAG-log: /data/dnb08/galaxy_db/files/9/f/a/dataset_9fa1852c-795d-4013-86ca-220355bc76e8.dat
Apr 11 23:41:39 sn06.galaxyproject.eu python[1296805]: galaxy.jobs.runners WARNING 2023-04-11 23:41:39,988 [pN:handler_sn06_2,p:1296805,tN:PulsarJobRunner.work_thread-1] BAG-log: /data/dnb08/galaxy_db/files/d/5/1/dataset_d51571cd-1700-4dc4-965c-889d254fba94.dat
Apr 11 23:41:39 sn06.galaxyproject.eu python[1296805]: galaxy.jobs.runners ERROR 2023-04-11 23:41:39,988 [pN:handler_sn06_2,p:1296805,tN:PulsarJobRunner.work_thread-1] BAG-log: dataset_path.false_path not populated, but outputs to working directory is True. job destination params: {"accounting_group_user": "55103", "default_file_action": "remote_transfer", "dependency_resolution": "none", "description": "fastqc", "jobs_directory": "/data/share/staging", "outputs_to_working_directory": "False", "priority": "-128", "rewrite_parameters": "True", "singularity_default_container_id": "/cvmfs/singularity.galaxyproject.org/u/b/ubuntu:18.04", "singularity_enabled": "True", "singularity_volumes": "$job_directory:rw,$tool_directory:ro,$job_directory/outputs:rw,$working_directory:rw", "submit_request_cpus": "8", "submit_request_memory": "4.0G", "transport": "curl"}, JobIO.outputs_to_working_directory: False
Apr 11 23:41:39 sn06.galaxyproject.eu python[1296805]: galaxy.jobs.runners WARNING 2023-04-11 23:41:39,998 [pN:handler_sn06_2,p:1296805,tN:PulsarJobRunner.work_thread-1] BAG-log: In outputs_to_woking_directory None
Apr 11 23:41:40 sn06.galaxyproject.eu python[1296805]: galaxy.jobs.runners WARNING 2023-04-11 23:41:40,000 [pN:handler_sn06_2,p:1296805,tN:PulsarJobRunner.work_thread-1] BAG-log: /data/dnb08/galaxy_db/files/7/8/5/dataset_785bd3ed-bf05-445c-a96b-a71a5c0788a3.dat
Apr 11 23:41:40 sn06.galaxyproject.eu python[1296805]: galaxy.jobs.runners ERROR 2023-04-11 23:41:40,001 [pN:handler_sn06_2,p:1296805,tN:PulsarJobRunner.work_thread-1] BAG-log: dataset_path.false_path not populated, but outputs to working directory is True. job destination params: {"accounting_group_user": "55103", "default_file_action": "remote_transfer", "dependency_resolution": "none", "description": "fastqc", "jobs_directory": "/data/share/staging", "outputs_to_working_directory": "False", "priority": "-128", "rewrite_parameters": "True", "singularity_default_container_id": "/cvmfs/singularity.galaxyproject.org/u/b/ubuntu:18.04", "singularity_enabled": "True", "singularity_volumes": "$job_directory:rw,$tool_directory:ro,$job_directory/outputs:rw,$working_directory:rw", "submit_request_cpus": "8", "submit_request_memory": "4.0G", "transport": "curl"}, JobIO.outputs_to_working_directory: False

With:

grafik

mvdbeek commented 1 year ago

OMG!!! outputs_to_working_directory is a string ....

mvdbeek commented 1 year ago

https://github.com/galaxyproject/galaxy/pull/15927 should fix this 🤞

bgruening commented 1 year ago

I think we can close this. The majority of tools seem to work now and we can proceed I think.

Thanks a lot @mvdbeek for all your support!