Closed mira-miracoli closed 1 year ago
0.14.13 is more than a year old (see https://github.com/galaxyproject/pulsar/compare/0.14.13...0.15.0.dev1 for changes). I would start by updating pulsar, then collect which tools produce errors in which category. We don't need data for all tools, a couple of examples should be sufficient. For each failing tool it'd then also be great to know the pulsar settings, the object store config as well as the relevant job destination parameters (extended metadata, remote metadata etc).
And Galaxy Handlers are not picking up status messages sometimes, which is fixed by a handler restart
when that happens can you produce a stack dump with py-spy for the handler that is supposed to process messages ?
Thank you @mvdbeek I will try to write everything down and update the VGCN Images to 0.14.15
when that happens can you produce a stack dump with py-spy for the handler that is supposed to process messages
Yes I can do that
Hi all, I’m trying to find the working configuration of our pulsar endpoint with usegalaxy.eu. Unfortunately, I’m also experiencing multiple behaviours, and I’m unable to identify the problems.
Run simple fastqc job.
$ ls working/
memory_statement.log output.html output.txt Sc_INPUT_fastq working
Fastqc was actually working fine, but after reinstalling the whole endpoint, no more. I can’t really find the problem here.
I see a different behaviour for mulled containers. For instance, a bowtie2 job, paired-end data, saccer3 reference data from history.
root@vgcn-exec-node-1-usegalaxy-eu:~$ ps -aux | grep "singularity"
pulsar 117874 0.7 0.1 1630708 59600 ? Sl 01:40 0:01 singularity -s exec --cleanenv -B /data/share/staging/58490602:/data/share/staging/58490602 -B /data/share/staging/58490602/tool_files:/data/share/staging/58490602/tool_files:ro -B /data/share/staging/58490602/outputs:/data/share/staging/58490602/outputs -B /data/share/staging/58490602/working:/data/share/staging/58490602/working --home /data/share/staging/58490602/home:/data/share/staging/58490602/home docker://quay.io/biocontainers/mulled-v2-c742dccc9d8fabfcff2af0d8d6799dbc711366cf:e292900a5ccb65f879af393340758539ef14f345-0 /bin/bash /data/share/staging/58490602/tool_script.sh
root 118153 0.0 0.0 6408 2228 pts/0 R+ 01:43 0:00 grep --color=auto singularity
but, I think it is stuck at the signature stage. If I manually run as pulsar user the command.sh script, editing the singularity command to have outputs, I retrieve:
INFO: Environment variable SINGULARITY_CACHEDIR is set, but APPTAINER_CACHEDIR is preferred
INFO: Converting OCI blobs to SIF format
INFO: Starting build...
Getting image source signatures
If I manually run the singularity command as pulsar user from command line, just to try, bowtie2 is run: https://gist.githubusercontent.com/mtangaro/3a478b5860f7e46f6ed487adbed11131/raw/e9081baf14c677f19e7219cbfa3474203c4627e5/gistfile1.txt
Finally, If I set the "job home directory as $HOME", the issue is there:
pulsar@vgcn-exec-node-1-usegalaxy-eu:/data/share/staging/58490602$ export HOME=/data/share/staging/58490602/home
pulsar@vgcn-exec-node-1-usegalaxy-eu:/data/share/staging/58490602$ echo $HOME
/data/share/staging/58490602/home
pulsar@vgcn-exec-node-1-usegalaxy-eu:/data/share/staging/58490602$ cd working/
pulsar@vgcn-exec-node-1-usegalaxy-eu:/data/share/staging/58490602/working$ singularity -v -s exec --cleanenv -B /data/share/staging/58490602:/data/share/staging/58490602 -B /data/share/staging/58490602/tool_files:/data/share/staging/58490602/tool_files:ro -B /data/share/staging/58490602/outputs:/data/share/staging/58490602/outputs -B /data/share/staging/58490602/working:/data/share/staging/58490602/working --home /data/share/staging/58490602/home:/data/share/staging/58490602/home docker://quay.io/biocontainers/mulled-v2-c742dccc9d8fabfcff2af0d8d6799dbc711366cf:e292900a5ccb65f879af393340758539ef14f345-0 /bin/bash /data/share/staging/58490602/tool_script.sh
INFO: Converting OCI blobs to SIF format
INFO: Starting build...
Getting image source signatures
Pulsar configuration details:
Pulsar Endpoint configuration on usegalaxy.eu: https://github.com/usegalaxy-eu/infrastructure-playbook/blob/7d0cfba99562e8ec5eb99c9939d71b9d91f00e43/files/galaxy/dynamic_rules/usegalaxy/destination_specifications.yaml#L737
Pulsar app,yml: https://gist.githubusercontent.com/mtangaro/1ffc7b6f07ba255ac72398401cc69baf/raw/d8192e8b3a56aa60a169436e8c0743a935c56870/gistfile1.txt
local_env.sh: no configurations
OS: RockyLinux 9
Pulsar: 0.14.15
How did you both end up with 0.14.15 ? That version was only current for a single day and had a major bug
How did you both end up with 0.14.15 ? That version was only current for a single day and had a major bug
Because for EuroScienceGateway we are both using the same image with the same pulsar version. Do you recommend 0.14.6 or 0.15.0dev1 (or both in two different endpoint)?
I haven't seen any release notes, so I didn't know there was a bug. Which version do you recommend? 0.14.15 or one of the dev versions?
Release notes are here and here, but even without the notes, why not go with the current stable version ? The pre-release version updates the k8s coexecution strategy to also work for tes for and adds a new option for amqp routes: https://github.com/galaxyproject/pulsar/compare/0.14.16...0.15.0.dev1#diff-e2de3cbc57ab9b097fca7d7c444c094ba3c1d4ee92b36d972d0d9d49669436cc
and had a major bug
should we update the changelog then? it only notes "small regressions bugs"
it's a small regression. if we consider this bad we can yank the release, but how did anyone end up with a release that was less than 24h old ?
is this some "let's not use the latest version" thinking ? please don't do that, most of what's being added in patch releases are bug fixes
Hi Marius, again I'm very sorry for using the wrong Pulsar version. The update to 0.14.16 version solved the upload issue and fastqc is finally working (again).
On the contrary the issue with bowtie2 is still there, with the job still stuck. The pulsar log: https://gist.githubusercontent.com/mtangaro/c5b24618beecbad0ab8be51169660d00/raw/e61408ca90a3e847d3d557019f65f2c0c0247528/gistfile1.txt And the process in the exec node:
pulsar 140719 1.1 0.1 1562864 45920 ? Sl 16:10 0:00 singularity -s exec --cleanenv -B /data/share/staging/58540489:/data/share/staging/58540489 -B /data/share/staging/58540489/tool_files:/data/share/staging/58540489/tool_files:ro -B /data/share/staging/58540489/outputs:/data/share/staging/58540489/outputs -B /data/share/staging/58540489/working:/data/share/staging/58540489/working --home /data/share/staging/58540489/home:/data/share/staging/58540489/home docker://quay.io/biocontainers/mulled-v2-c742dccc9d8fabfcff2af0d8d6799dbc711366cf:e292900a5ccb65f879af393340758539ef14f345-0 /bin/bash /data/share/staging/58540489/tool_script.sh
The singularity container is still not running:
$ singularity instance list
INSTANCE NAME PID IP IMAGE
Do you have any idea?
That's a different issue, and probably not related to pulsar. That said:
Environment variable SINGULARITY_CACHEDIR is set, but APPTAINER_CACHEDIR is preferred
does the user that runs the job have that variable set ? If that's needed I would set it in your job destination
I also see WARNING: Cache disabled - cache location /opt/pulsar is not writable.
with the manual run, not sure which way you want to go with this, but the cache should be on a fast local disk.
Ok, a bit of progress. With the latest +dev1 version the None
in the path are gone. A few tools are now working. More complicated tools like bowtie2 and star return in the Galaxy UI Job 58583504's output dataset(s) could not be read
.
This above error is with ouputs_to_workingdir. If I disable to outputs_to_workingdir setting I get this traceback:
Apr 06 17:54:14 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[239718]: Traceback (most recent call last):
Apr 06 17:54:14 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[239718]: File "/opt/pulsar/venv3/lib64/python3.9/site-packages/pulsar/client/staging/down.py", line 93, in __collect_working_directory_outputs
Apr 06 17:54:14 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[239718]: self.output_files.remove(output_file)
Apr 06 17:54:14 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[239718]: ValueError: list.remove(x): x not in list
Apr 06 17:54:14 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[239718]: During handling of the above exception, another exception occurred:
Apr 06 17:54:14 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[239718]: Traceback (most recent call last):
Apr 06 17:54:14 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[239718]: File "/opt/pulsar/venv3/lib64/python3.9/site-packages/pulsar/managers/stateful.py", line 223, in do_postprocess
Apr 06 17:54:14 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[239718]: postprocess_success = postprocess(job_directory, self.__postprocess_action_executor)
Apr 06 17:54:14 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[239718]: File "/opt/pulsar/venv3/lib64/python3.9/site-packages/pulsar/managers/staging/post.py", line 23, in postprocess
Apr 06 17:54:14 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[239718]: collected = __collect_outputs(job_directory, staging_config, action_executor)
Apr 06 17:54:14 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[239718]: File "/opt/pulsar/venv3/lib64/python3.9/site-packages/pulsar/managers/staging/post.py", line 38, in __collect_outputs
Apr 06 17:54:14 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[239718]: collection_failure_exceptions = results_collector.collect()
Apr 06 17:54:14 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[239718]: File "/opt/pulsar/venv3/lib64/python3.9/site-packages/pulsar/client/staging/down.py", line 69, in collect
Apr 06 17:54:14 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[239718]: self.__collect_working_directory_outputs()
Apr 06 17:54:14 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[239718]: File "/opt/pulsar/venv3/lib64/python3.9/site-packages/pulsar/client/staging/down.py", line 95, in __collect_working_directory_outputs
Apr 06 17:54:14 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[239718]: raise Exception("Failed to remove {} from {}".format(output_file, self.output_files))
Apr 06 17:54:14 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[239718]: Exception: Failed to remove None from ['/data/dnb08/galaxy_db/files/1/9/5/dataset_1951953a-c2dc-42d9-ae6e-82ac5a4684e2.dat', '/data/dnb08/galaxy_db/files/a/8/d/dataset_a8dbe145-63a8-48d3-8106-4beeeb0721cc.dat', '/data/dnb08/galaxy_db/files/3/4/e/dataset_34e6f145-8543-4746-bf58-abc5f760a5e1.dat']
During debugging I found that we are not setting --pwd
in Singularity, which I think we be good, to force a working dir.
I can not prove it, but I have the feeling that handlers are forgetting pulsar jobs from time to time. I try to understand this, but what out for it if you debug pulsar.
This above error is with ouputs_to_workingdir.
that is not a setting that makes sense or is compatible with pulsar, see https://github.com/galaxyproject/pulsar/issues/193#issuecomment-1106377936
During debugging I found that we are not setting --pwd in Singularity, which I think we be good, to force a working dir.
that's not needed for .org and we are running pulsar with singularity (correct me if I'm wrong @natefoo)
I'll take a look at the traceback, it looks like something I already fixed in a different spot of the codebase. Do you know what tool this was ?
This above error is with ouputs_to_workingdir.
that is not a setting that makes sense or is compatible with pulsar, see https://github.com/galaxyproject/pulsar/issues/193#issuecomment-1106377936
Yes, I know, but if this is part of our default destinations (non-pulsar settings) I think Pulsar should ignore it. But it seems it actually does something with it.
During debugging I found that we are not setting
--pwd
in Singularity, which I think we be good, to force a working dir.that's not needed for .org and we are running pulsar with singularity (correct me if I'm wrong @natefoo) Its not needed, just a convenience for admins and more explicit. Ignore it, was just an observation. I'll take a look at the traceback, it looks like something I already fixed in a different spot of the codebase. Do you know what tool this was ?
FASTQC and Star. Funnily, FASTQC works with outputs_to_working_directory=true
and turns green. But crashes with outputs_to_working_directory=false
You have on EU under user preference an option where you can choose your destinations:
(sorry edited your comment instead of replying)
But crashes with
outputs_to_working_directory=false
that is probably fixed by https://github.com/galaxyproject/galaxy/pull/15918/commits/ff8835b75771a79922faa209cb65c2c192f015ad ... at least it was flagged in the framework tests
Thanks @mvdbeek I'm happy to test this on EU as soon as you think it's ready.
Thanks @mvdbeek! I have updated Galaxy to the latest 23.0 commit and the pulsar server to dev2.
Star fails with the None
error again.
2678]] Failed to execute staging out file /data/share/staging/58682678/working/Log.final.out via FileAction[path=None,action_type=remote_transfer,url=https://usegalaxy.eu/_job_files?job_id=11ac94870d0bb33a2ff31a8386e3421c&job_key=b269333ccac1e9147600fa818dc8aa4a&path=None&file_type=output_workdir], retrying in 2.0 seconds.
Apr 10 23:24:31 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[464405]: Traceback (most recent call last):
Apr 10 23:24:31 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[464405]: File "/opt/pulsar/venv3/lib64/python3.9/site-packages/pulsar/managers/util/retry.py", line 93, in _retry_over_time
Apr 10 23:24:31 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[464405]: return fun(*args, **kwargs)
Apr 10 23:24:31 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[464405]: File "/opt/pulsar/venv3/lib64/python3.9/site-packages/pulsar/managers/staging/post.py", line 82, in <lambda>
Apr 10 23:24:31 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[464405]: self.action_executor.execute(lambda: action.write_from_path(pulsar_path), description)
Apr 10 23:24:31 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[464405]: File "/opt/pulsar/venv3/lib64/python3.9/site-packages/pulsar/client/action_mapper.py", line 482, in write_from_path
Apr 10 23:24:31 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[464405]: post_file(self.url, pulsar_path)
Apr 10 23:24:31 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[464405]: File "/opt/pulsar/venv3/lib64/python3.9/site-packages/pulsar/client/transport/curl.py", line 77, in post_file
Apr 10 23:24:31 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[464405]: raise Exception(message)
Apr 10 23:24:31 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[464405]: Exception: Failed to post_file properly for url https://usegalaxy.eu/_job_files?job_id=11ac94870d0bb33a2ff31a8386e3421c&job_key=b269333ccac1e9147600fa818dc8aa4a&path=None&file_type=output_workdir, remote server returned status code of 500.
Apr 10 23:24:36 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[464405]: 2023-04-10 23:24:36,852 INFO [pulsar.managers.util.retry][[manager=production]-[action=postprocess]-[job=58682678]] Failed to execute staging out file /data/share/staging/58682678/working/Log.final.out via FileAction[path=None,action_type=remote_transfer,url=https://usegalaxy.eu/_job_files?job_id=11ac94870d0bb33a2ff31a8386e3421c&job_key=b269333ccac1e9147600fa818dc8aa4a&path=None&file_type=output_workdir], retrying in 4.0 seconds.
Fastqc fails similarly with:
Apr 10 23:28:34 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[464405]: 2023-04-10 23:28:34,416 INFO [pulsar.managers.util.retry][[manager=production]-[action=postprocess]-[job=58682692]] Failed to execute staging out file /data/share/staging/58682692/working/output.html via FileAction[path=None,action_type=remote_transfer,url=https://usegalaxy.eu/_job_files?job_id=11ac94870d0bb33a1f919eeaac1d4c23&job_key=b269333ccac1e91404917e3d08c2ca32&path=None&file_type=output_workdir], retrying in 2.0 seconds.
Apr 10 23:28:34 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[464405]: Traceback (most recent call last):
Apr 10 23:28:34 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[464405]: File "/opt/pulsar/venv3/lib64/python3.9/site-packages/pulsar/managers/util/retry.py", line 93, in _retry_over_time
Apr 10 23:28:34 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[464405]: return fun(*args, **kwargs)
Apr 10 23:28:34 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[464405]: File "/opt/pulsar/venv3/lib64/python3.9/site-packages/pulsar/managers/staging/post.py", line 82, in <lambda>
Apr 10 23:28:34 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[464405]: self.action_executor.execute(lambda: action.write_from_path(pulsar_path), description)
Apr 10 23:28:34 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[464405]: File "/opt/pulsar/venv3/lib64/python3.9/site-packages/pulsar/client/action_mapper.py", line 482, in write_from_path
Apr 10 23:28:34 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[464405]: post_file(self.url, pulsar_path)
Apr 10 23:28:34 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[464405]: File "/opt/pulsar/venv3/lib64/python3.9/site-packages/pulsar/client/transport/curl.py", line 77, in post_file
Apr 10 23:28:34 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[464405]: raise Exception(message)
Apr 10 23:28:34 vgcn-central-manager-usegalaxy-eu.garr.cloud.pa pulsar[464405]: Exception: Failed to post_file properly for url https://usegalaxy.eu/_job_files?job_id=11ac94870d0bb33a1f919eeaac1d4c23&job_key=b269333ccac1e91404917e3d08c2ca32&path=None&file_type=output_workdir, remote server returned status code of 500.
``
it'd be amazing if we could hook up pulsar to sentry ... i wonder why we don't have a path
Is that a feature or deployment wish? ;)
I started work on this 😆
hmm, I can't get it to fail ... which is good and bad 😆. That's still fastqc with https://github.com/usegalaxy-eu/infrastructure-playbook/blob/fc5d2f3438ed6edbc8f8d26e5b96056c1c3e3cd2/files/galaxy/dynamic_rules/usegalaxy/destination_specifications.yaml#L737 ?
The one above. With workingdir = false. You can choose that in your user preferences on EU
It actually all works for me, with or without outputs_to_working_directory
, super weird. I'm gonna add the sentry integration, hoping the local variables will help us figure out what's going on.
Let me know if I can do anything to debug this further. As you know we have celery enabled and metadata_stradegy=extended.
I got access to one of the pulsar nodes, and I see it fail there. deploying some debugging now.
Hmm, I think my best guess at this point is that in https://github.com/mvdbeek/galaxy/blob/629655d300c4da86ede946a733d3b41f16fb37d0/lib/galaxy/jobs/runners/__init__.py#L337 for some reason you get true
for the destination, but the false_path
is still None
. I have no idea how that would happen ...
Could you try https://github.com/galaxyproject/galaxy/commit/fd80dde17346cd45d465326a7f2a52d358464692? It's possible that the way we build the JobIO instances we may have a reference to the global outputs_to_working_directory setting instead of the one specified in the job destination.
I started something similar before:
Apr 11 23:01:56 sn06.galaxyproject.eu python[135946]: galaxy.jobs.runners WARNING 2023-04-11 23:01:56,878 [pN:handler_sn06_4,p:135946,tN:PulsarJobRunner.work_thread-1] BAG-log: /data/dnb08/galaxy_db/files/5/9/b/dataset_59b363e1-b397-4634-9bd1-c64c4a536d6b.dat Apr 11 23:01:56 sn06.galaxyproject.eu python[135946]: galaxy.jobs.runners WARNING 2023-04-11 23:01:56,878 [pN:handler_sn06_4,p:135946,tN:PulsarJobRunner.work_thread-1] BAG-log: In outputs_to_woking_directory None Apr 11 23:01:56 sn06.galaxyproject.eu python[135946]: galaxy.jobs.runners WARNING 2023-04-11 23:01:56,878 [pN:handler_sn06_4,p:135946,tN:PulsarJobRunner.work_thread-1] BAG-log: /data/dnb08/galaxy_db/files/6/9/a/dataset_69ad2920-743a-490d-a5cd-9cb897065b57.dat Apr 11 23:01:56 sn06.galaxyproject.eu python[135946]: galaxy.jobs.runners WARNING 2023-04-11 23:01:56,879 [pN:handler_sn06_4,p:135946,tN:PulsarJobRunner.work_thread-1] BAG-log: In outputs_to_woking_directory None
Will run your patch next.
The path
is always true.
real_path
is always ser, but false_path
is only being set up if outputs_to_working_directory
is true. Looks like there is a mismatch somewhere in the life cycle of the job wrapper instance ... very weird. If you have the logs from https://github.com/galaxyproject/galaxy/commit/fd80dde17346cd45d465326a7f2a52d358464692 maybe we can figure this out based on the job destinations
Here we go!
Apr 11 23:41:16 sn06.galaxyproject.eu python[1296762]: galaxy.jobs.runners WARNING 2023-04-11 23:41:16,538 [pN:handler_sn06_0,p:1296762,tN:PulsarJobRunner.work_thread-2] BAG-log: /data/dnb08/galaxy_db/files/d/5/1/dataset_d51571cd-1700-4dc4-965c-889d254fba94.dat
Apr 11 23:41:16 sn06.galaxyproject.eu python[1296762]: galaxy.jobs.runners ERROR 2023-04-11 23:41:16,538 [pN:handler_sn06_0,p:1296762,tN:PulsarJobRunner.work_thread-2] BAG-log: dataset_path.false_path not populated, but outputs to working directory is True. job destination params: {"priority": "-128", "submit_request_cpus": "8", "submit_request_memory": "4.0G", "jobs_directory": "/data/share/staging", "default_file_action": "remote_transfer", "dependency_resolution": "none", "outputs_to_working_directory": "False", "rewrite_parameters": "True", "transport": "curl", "singularity_enabled": "True", "singularity_default_container_id": "/cvmfs/singularity.galaxyproject.org/u/b/ubuntu:18.04", "singularity_volumes": "$job_directory:rw,$tool_directory:ro,$job_directory/outputs:rw,$working_directory:rw", "accounting_group_user": "55103", "description": "fastqc"}, JobIO.outputs_to_working_directory: False
Apr 11 23:41:16 sn06.galaxyproject.eu python[1296762]: galaxy.jobs.runners WARNING 2023-04-11 23:41:16,549 [pN:handler_sn06_0,p:1296762,tN:PulsarJobRunner.work_thread-2] BAG-log: In outputs_to_woking_directory None
Apr 11 23:41:16 sn06.galaxyproject.eu python[1296762]: galaxy.jobs.runners WARNING 2023-04-11 23:41:16,549 [pN:handler_sn06_0,p:1296762,tN:PulsarJobRunner.work_thread-2] BAG-log: /data/dnb08/galaxy_db/files/7/8/5/dataset_785bd3ed-bf05-445c-a96b-a71a5c0788a3.dat
Apr 11 23:41:16 sn06.galaxyproject.eu python[1296762]: galaxy.jobs.runners ERROR 2023-04-11 23:41:16,550 [pN:handler_sn06_0,p:1296762,tN:PulsarJobRunner.work_thread-2] BAG-log: dataset_path.false_path not populated, but outputs to working directory is True. job destination params: {"priority": "-128", "submit_request_cpus": "8", "submit_request_memory": "4.0G", "jobs_directory": "/data/share/staging", "default_file_action": "remote_transfer", "dependency_resolution": "none", "outputs_to_working_directory": "False", "rewrite_parameters": "True", "transport": "curl", "singularity_enabled": "True", "singularity_default_container_id": "/cvmfs/singularity.galaxyproject.org/u/b/ubuntu:18.04", "singularity_volumes": "$job_directory:rw,$tool_directory:ro,$job_directory/outputs:rw,$working_directory:rw", "accounting_group_user": "55103", "description": "fastqc"}, JobIO.outputs_to_working_directory: False
Apr 11 23:41:16 sn06.galaxyproject.eu python[1296762]: galaxy.jobs.runners WARNING 2023-04-11 23:41:16,566 [pN:handler_sn06_0,p:1296762,tN:PulsarJobRunner.work_thread-2] BAG-log: In outputs_to_woking_directory None
Apr 11 23:41:30 sn06.galaxyproject.eu python[1297074]: galaxy.jobs.runners WARNING 2023-04-11 23:41:30,367 [pN:handler_sn06_1,p:1297074,tN:PulsarJobRunner.work_thread-0] BAG-log: /data/dnb08/galaxy_db/files/5/8/4/dataset_58400a1f-b6f3-4619-8f74-1d5d8e58aeb6.dat
Apr 11 23:41:30 sn06.galaxyproject.eu python[1297074]: galaxy.jobs.runners WARNING 2023-04-11 23:41:30,367 [pN:handler_sn06_1,p:1297074,tN:PulsarJobRunner.work_thread-0] BAG-log: /data/dnb08/galaxy_db/files/4/d/a/dataset_4da03c7d-fc61-40c7-a201-bff7f096091c.dat
Apr 11 23:41:30 sn06.galaxyproject.eu python[1297070]: galaxy.jobs.runners WARNING 2023-04-11 23:41:30,553 [pN:handler_sn06_5,p:1297070,tN:PulsarJobRunner.work_thread-0] BAG-log: /data/dnb08/galaxy_db/files/d/f/3/dataset_df34a12f-329a-480c-aed3-c3208fcd2a77.dat
Apr 11 23:41:30 sn06.galaxyproject.eu python[1297070]: galaxy.jobs.runners WARNING 2023-04-11 23:41:30,554 [pN:handler_sn06_5,p:1297070,tN:PulsarJobRunner.work_thread-0] BAG-log: /data/dnb08/galaxy_db/files/2/c/5/dataset_2c53bc72-b54a-4e71-b471-265bfc4021a1.dat
Apr 11 23:41:33 sn06.galaxyproject.eu python[1297074]: galaxy.jobs.runners WARNING 2023-04-11 23:41:33,814 [pN:handler_sn06_1,p:1297074,tN:PulsarJobRunner.work_thread-1] BAG-log: /data/dnb08/galaxy_db/files/6/3/9/dataset_639d2300-ae74-4d7b-8a8c-ae7f24d13b3e.dat
Apr 11 23:41:33 sn06.galaxyproject.eu python[1297074]: galaxy.jobs.runners WARNING 2023-04-11 23:41:33,815 [pN:handler_sn06_1,p:1297074,tN:PulsarJobRunner.work_thread-1] BAG-log: /data/dnb08/galaxy_db/files/1/9/2/dataset_19240ca1-5ef0-4dc5-9bb7-de98a3e76a21.dat
Apr 11 23:41:33 sn06.galaxyproject.eu python[1297070]: galaxy.jobs.runners WARNING 2023-04-11 23:41:33,987 [pN:handler_sn06_5,p:1297070,tN:PulsarJobRunner.work_thread-3] BAG-log: /data/dnb08/galaxy_db/files/6/0/e/dataset_60e55f05-f12d-47d2-b667-4eeee99e29ec.dat
Apr 11 23:41:33 sn06.galaxyproject.eu python[1297070]: galaxy.jobs.runners WARNING 2023-04-11 23:41:33,988 [pN:handler_sn06_5,p:1297070,tN:PulsarJobRunner.work_thread-3] BAG-log: /data/dnb08/galaxy_db/files/f/0/9/dataset_f097ab15-4a32-4796-bc4d-87729969e511.dat
Apr 11 23:41:35 sn06.galaxyproject.eu python[1297074]: galaxy.jobs.runners WARNING 2023-04-11 23:41:35,612 [pN:handler_sn06_1,p:1297074,tN:PulsarJobRunner.work_thread-3] BAG-log: /data/dnb08/galaxy_db/files/4/7/a/dataset_47ab0e9a-bea5-42f2-b585-47b24bc77e61.dat
Apr 11 23:41:35 sn06.galaxyproject.eu python[1297074]: galaxy.jobs.runners WARNING 2023-04-11 23:41:35,612 [pN:handler_sn06_1,p:1297074,tN:PulsarJobRunner.work_thread-3] BAG-log: /data/dnb08/galaxy_db/files/9/8/d/dataset_98da1401-6931-4b03-997b-145bb40d97bc.dat
Apr 11 23:41:35 sn06.galaxyproject.eu python[1297070]: galaxy.jobs.runners WARNING 2023-04-11 23:41:35,888 [pN:handler_sn06_5,p:1297070,tN:PulsarJobRunner.work_thread-2] BAG-log: /data/dnb08/galaxy_db/files/5/0/c/dataset_50c21c12-f0b6-4500-a1a7-c432d996132b.dat
Apr 11 23:41:35 sn06.galaxyproject.eu python[1297070]: galaxy.jobs.runners WARNING 2023-04-11 23:41:35,888 [pN:handler_sn06_5,p:1297070,tN:PulsarJobRunner.work_thread-2] BAG-log: /data/dnb08/galaxy_db/files/e/e/f/dataset_eefe6594-6513-45d8-8af7-6fa8335fa317.dat
Apr 11 23:41:36 sn06.galaxyproject.eu python[1297074]: galaxy.jobs.runners WARNING 2023-04-11 23:41:36,857 [pN:handler_sn06_1,p:1297074,tN:PulsarJobRunner.work_thread-2] BAG-log: /data/dnb08/galaxy_db/files/1/a/a/dataset_1aaf4c77-6e27-4e04-b014-0135234bc82c.dat
Apr 11 23:41:36 sn06.galaxyproject.eu python[1297074]: galaxy.jobs.runners WARNING 2023-04-11 23:41:36,858 [pN:handler_sn06_1,p:1297074,tN:PulsarJobRunner.work_thread-2] BAG-log: /data/dnb08/galaxy_db/files/d/3/0/dataset_d30e64f1-bf6c-48cf-9acb-bb7d8e5381d8.dat
Apr 11 23:41:37 sn06.galaxyproject.eu python[1297070]: galaxy.jobs.runners WARNING 2023-04-11 23:41:37,545 [pN:handler_sn06_5,p:1297070,tN:PulsarJobRunner.work_thread-1] BAG-log: /data/dnb08/galaxy_db/files/4/4/3/dataset_443b5639-3e80-47b5-a5d9-2f2beb9de2b5.dat
Apr 11 23:41:37 sn06.galaxyproject.eu python[1297070]: galaxy.jobs.runners WARNING 2023-04-11 23:41:37,545 [pN:handler_sn06_5,p:1297070,tN:PulsarJobRunner.work_thread-1] BAG-log: /data/dnb08/galaxy_db/files/9/f/a/dataset_9fa1852c-795d-4013-86ca-220355bc76e8.dat
Apr 11 23:41:39 sn06.galaxyproject.eu python[1296805]: galaxy.jobs.runners WARNING 2023-04-11 23:41:39,988 [pN:handler_sn06_2,p:1296805,tN:PulsarJobRunner.work_thread-1] BAG-log: /data/dnb08/galaxy_db/files/d/5/1/dataset_d51571cd-1700-4dc4-965c-889d254fba94.dat
Apr 11 23:41:39 sn06.galaxyproject.eu python[1296805]: galaxy.jobs.runners ERROR 2023-04-11 23:41:39,988 [pN:handler_sn06_2,p:1296805,tN:PulsarJobRunner.work_thread-1] BAG-log: dataset_path.false_path not populated, but outputs to working directory is True. job destination params: {"accounting_group_user": "55103", "default_file_action": "remote_transfer", "dependency_resolution": "none", "description": "fastqc", "jobs_directory": "/data/share/staging", "outputs_to_working_directory": "False", "priority": "-128", "rewrite_parameters": "True", "singularity_default_container_id": "/cvmfs/singularity.galaxyproject.org/u/b/ubuntu:18.04", "singularity_enabled": "True", "singularity_volumes": "$job_directory:rw,$tool_directory:ro,$job_directory/outputs:rw,$working_directory:rw", "submit_request_cpus": "8", "submit_request_memory": "4.0G", "transport": "curl"}, JobIO.outputs_to_working_directory: False
Apr 11 23:41:39 sn06.galaxyproject.eu python[1296805]: galaxy.jobs.runners WARNING 2023-04-11 23:41:39,998 [pN:handler_sn06_2,p:1296805,tN:PulsarJobRunner.work_thread-1] BAG-log: In outputs_to_woking_directory None
Apr 11 23:41:40 sn06.galaxyproject.eu python[1296805]: galaxy.jobs.runners WARNING 2023-04-11 23:41:40,000 [pN:handler_sn06_2,p:1296805,tN:PulsarJobRunner.work_thread-1] BAG-log: /data/dnb08/galaxy_db/files/7/8/5/dataset_785bd3ed-bf05-445c-a96b-a71a5c0788a3.dat
Apr 11 23:41:40 sn06.galaxyproject.eu python[1296805]: galaxy.jobs.runners ERROR 2023-04-11 23:41:40,001 [pN:handler_sn06_2,p:1296805,tN:PulsarJobRunner.work_thread-1] BAG-log: dataset_path.false_path not populated, but outputs to working directory is True. job destination params: {"accounting_group_user": "55103", "default_file_action": "remote_transfer", "dependency_resolution": "none", "description": "fastqc", "jobs_directory": "/data/share/staging", "outputs_to_working_directory": "False", "priority": "-128", "rewrite_parameters": "True", "singularity_default_container_id": "/cvmfs/singularity.galaxyproject.org/u/b/ubuntu:18.04", "singularity_enabled": "True", "singularity_volumes": "$job_directory:rw,$tool_directory:ro,$job_directory/outputs:rw,$working_directory:rw", "submit_request_cpus": "8", "submit_request_memory": "4.0G", "transport": "curl"}, JobIO.outputs_to_working_directory: False
With:
OMG!!! outputs_to_working_directory
is a string ....
https://github.com/galaxyproject/galaxy/pull/15927 should fix this 🤞
I think we can close this. The majority of tools seem to work now and we can proceed I think.
Thanks a lot @mvdbeek for all your support!
I observe mixed behaviour in my Pulsar endpoint:
working/out
Currently I am unable to spot any pattern here or any reliable setting that works. I will try to observe changing parts around it more. What adds up to it might be that some tools are not working as expected, e.g. ENAsearch is producing wrong output for xml. And Galaxy Handlers are not picking up status messages sometimes, which is fixed by a handler restart
OS
RockyLinux 9.1
Pulsar
pulsar-app==0.14.13
Galaxy
23.0europe