Open hexylena opened 3 weeks ago
Changing the datatype of the output in the tool as it runs in a workflow. This has no effect.
There are a lot of tests for this in the codebase, here's one that I just put together: https://usegalaxy.org/u/marius/w/change-collection-datatype
It is possible that you have a traceback somewhere in your logs, in that case it would be good if you can post that.
The change datatype tab is completely missing?
You'll need celery for changing datatypes in batch. If you don't have Celery that tab isn't shown.
Celery was configured (I definitely forgot that was a requirement for that!)
gravity:
celery:
concurrency: 2
loglevel: DEBUG
and seems to be processing jobs
galaxyctl[162059]: [2024-10-28 13:50:13,755: INFO/main] Task galaxy.dispatch_pending_notifications[ff410dcc-8d40-4899-a1ce-0c3896bad719] succeeded in 0.018974624574184418s: None
galaxyctl[162059]: [2024-10-28 13:55:14,394: INFO/main] Task galaxy.clean_object_store_caches[cc41221d-88bb-4050-9ff4-82563ecae6dc] received
galaxyctl[162059]: [2024-10-28 13:55:14,394: DEBUG/main] TaskPool: Apply <function fast_trace_task at 0x7f9e489e13a0> (args:('galaxy.clean_object_store_caches', 'cc41221d-88bb-4050-9ff4-82563ecae6dc', {'lang': 'py', 'task': 'galaxy.clean_object_store_caches', 'id': 'cc41221d-88bb-4050-9ff4-82563ecae6dc', 'shadow': None, 'eta': None, 'expires': None, 'group': None, 'group_index': None, 'retries': 0, 'timelimit': [None, None], 'root_id': 'cc41221d-88bb-4050-9ff4-82563ecae6dc', 'parent_id': None, 'argsrepr': '()', 'kwargsrepr': '{}', 'origin': 'gen162008@bioinf-galaxy.erasmusmc.nl', 'ignore_result': False, 'replaced_task_nesting': 0, 'stamped_headers': None, 'stamps': {}, 'properties': {'correlation_id': 'cc41221d-88bb-4050-9ff4-82563ecae6dc', 'reply_to': '7b208dab-a6ca-3fbd-a45d-f39e3521de2f', 'delivery_mode': 2, 'delivery_info': {'exchange': '', 'routing_key': 'galaxy.internal'}, 'priority': 0, 'body_encoding': 'base64', 'delivery_tag': '6a54628e-81e9-4ffa-81e0-7774b9889fc4'}, 'reply_to': '7b208dab-a6ca-3fbd-a45d-f39e3521de2f', 'correlation_id': 'cc41221d-88bb-4050-9ff4-82563ecae6dc', 'hostname':... kwargs:{})
galaxyctl[162059]: [2024-10-28 13:55:14,398: INFO/main] Successfully executed Celery task clean_object_store_caches to prune object store cache directories clean_object_store_caches to prune object store cache directories (0.132 ms)
galaxyctl[162059]: [2024-10-28 13:55:14,794: INFO/main] Task galaxy.clean_object_store_caches[cc41221d-88bb-4050-9ff4-82563ecae6dc] succeeded in 0.39791450649499893s: None
the tracebacks all look pretty normal:
$ journalctl -u galaxy-gunicorn --since '1 day ago' | grep Traceback -A50 | egrep '(Exception|Error)' | cut -c 44-
galaxyctl[108675]: AssertionError: File /srv/galaxy/var/tool-data/sift_db.loc specified by the 'filename' attribute not found
galaxyctl[109689]: AssertionError: File /srv/galaxy/var/tool-data/sift_db.loc specified by the 'filename' attribute not found
galaxyctl[109691]: AssertionError: File /srv/galaxy/var/tool-data/sift_db.loc specified by the 'filename' attribute not found
galaxyctl[158149]: AssertionError: File /srv/galaxy/var/tool-data/sift_db.loc specified by the 'filename' attribute not found
galaxyctl[159334]: AssertionError: File /srv/galaxy/var/tool-data/sift_db.loc specified by the 'filename' attribute not found
galaxyctl[159337]: galaxy.tool_util.toolbox.base ERROR 2024-10-28 10:55:04,276 [pN:main.2,p:159337,tN:Thread-2] Error reading tool from path: phenotype_association/sift.xml
galaxyctl[159337]: AssertionError: File /srv/galaxy/var/tool-data/sift_db.loc specified by the 'filename' attribute not found
galaxyctl[161842]: AssertionError: File /srv/galaxy/var/tool-data/sift_db.loc specified by the 'filename' attribute not found
galaxyctl[162844]: AssertionError: File /srv/galaxy/var/tool-data/sift_db.loc specified by the 'filename' attribute not found
galaxyctl[162846]: galaxy.tool_util.toolbox.base ERROR 2024-10-28 10:57:09,667 [pN:main.2,p:162846,tN:Thread-2] Error reading tool from path: phenotype_association/sift.xml
galaxyctl[162846]: AssertionError: File /srv/galaxy/var/tool-data/sift_db.loc specified by the 'filename' attribute not found
galaxyctl[162846]: AssertionError: File /srv/galaxy/var/tool-data/sift_db.loc specified by the 'filename' attribute not found
galaxyctl[162844]: AssertionError: File /srv/galaxy/var/tool-data/sift_db.loc specified by the 'filename' attribute not found
galaxyctl[162846]: raise RequestParameterInvalidException(
galaxyctl[162846]: galaxy.exceptions.RequestParameterInvalidException: Extension 'txt,tabular' unknown, cannot use dataset collection as input
You need to enable celery in the galaxy config (the jobs you listed are cron-style jobs), and
galaxy.exceptions.RequestParameterInvalidException: Extension 'txt,tabular' unknown, cannot use dataset collection as input
explains the second part.
You need to enable celery in the galaxy config
right there's multiple celery toggles. Yes you're right I'm missing enable celery tasks.
this needs to be communicated more usefully to the user/admin, I think? e.g. showing the tab but disabling it and having a tooltip of "please enable celery tasks in your galaxy.yml to allow changing datatypes of a collection" would have potentially removed this issue completely.
explains the second part.
I'm not sure it does? that was unrelated testing on the same dataset and i triggered it by trying to extract element identifiers from that collection (which required manually dragging it into the form, hence I didn't report that) though I can see how it looks related
Is there a reason not to default enable_celery_tasks
at this point? The two documented install and run methods (Ansible, Gravity via run.sh
or directly) make sure you have a running Celery, and there are more and more parts of Galaxy that don't function without it.
Also, SQLAlchemy can be used as a results backend, is there any reason not to have Galaxy use it as the default if you don't specify something else (e.g. redis)?
Also, SQLAlchemy can be used as a results backend, is there any reason not to have Galaxy use it as the default if you don't specify something else (e.g. redis)?
Regarding this, the default is now using a simple SQLite database as the results backend https://github.com/galaxyproject/galaxy/pull/17949
I think the main concern about enabling it by default was the user rate limiting issue, but if I remember correctly it was fixed some time ago, so probably we could enable it by default at this point.
the default is now using a simple SQLite database as the results backend
couldn't/shouldn't this default to using whatever the database connection is? so it could default to postgres when that's in use?
either way would be great to have this enabled by default!! (or any notification to the end user that this feature is available in galaxy but disabled due to administrator (mis)configuration)
For 25.0 we'll consider enabling Celery by default.
Describe the bug
I'm trying to work around https://github.com/galaxyproject/tools-iuc/pull/6493 which produces a collection labelled
txt,tabular
.Galaxy Version and/or server at which you observed the bug
The Galaxy Server is running version 24.1.4.dev0 , and the web client was built on Saturday Oct 26th 10:07:28 2024 GMT+2 . Commit: ccf4353f09cc92ad6cb1e01eaf8022da9fed822e
Browser and Operating System Operating System: Linux Browser: Chrome
To Reproduce Steps to reproduce the behavior:
I've tried two solutions:
Expected behavior
I can change the datatype, either via WF, or afterwards manually to work around issues manually.
Screenshots
Additional context
potentially xref #17734