galaxyproject / galaxy

Data intensive science for everyone.
https://galaxyproject.org
Other
1.42k stars 1.01k forks source link

Cannot change datatype of a collection #19064

Open hexylena opened 3 weeks ago

hexylena commented 3 weeks ago

Describe the bug

I'm trying to work around https://github.com/galaxyproject/tools-iuc/pull/6493 which produces a collection labelled txt,tabular.

Galaxy Version and/or server at which you observed the bug

The Galaxy Server is running version 24.1.4.dev0 , and the web client was built on Saturday Oct 26th 10:07:28 2024 GMT+2 . Commit: ccf4353f09cc92ad6cb1e01eaf8022da9fed822e

Browser and Operating System Operating System: Linux Browser: Chrome

To Reproduce Steps to reproduce the behavior:

I've tried two solutions:

Expected behavior

I can change the datatype, either via WF, or afterwards manually to work around issues manually.

Screenshots

a

Additional context

potentially xref #17734

mvdbeek commented 3 weeks ago

Changing the datatype of the output in the tool as it runs in a workflow. This has no effect.

There are a lot of tests for this in the codebase, here's one that I just put together: https://usegalaxy.org/u/marius/w/change-collection-datatype

It is possible that you have a traceback somewhere in your logs, in that case it would be good if you can post that.

The change datatype tab is completely missing?

You'll need celery for changing datatypes in batch. If you don't have Celery that tab isn't shown.

hexylena commented 3 weeks ago

Celery was configured (I definitely forgot that was a requirement for that!)

gravity:
    celery:
        concurrency: 2
        loglevel: DEBUG

and seems to be processing jobs

galaxyctl[162059]: [2024-10-28 13:50:13,755: INFO/main] Task galaxy.dispatch_pending_notifications[ff410dcc-8d40-4899-a1ce-0c3896bad719] succeeded in 0.018974624574184418s: None
galaxyctl[162059]: [2024-10-28 13:55:14,394: INFO/main] Task galaxy.clean_object_store_caches[cc41221d-88bb-4050-9ff4-82563ecae6dc] received
galaxyctl[162059]: [2024-10-28 13:55:14,394: DEBUG/main] TaskPool: Apply <function fast_trace_task at 0x7f9e489e13a0> (args:('galaxy.clean_object_store_caches', 'cc41221d-88bb-4050-9ff4-82563ecae6dc', {'lang': 'py', 'task': 'galaxy.clean_object_store_caches', 'id': 'cc41221d-88bb-4050-9ff4-82563ecae6dc', 'shadow': None, 'eta': None, 'expires': None, 'group': None, 'group_index': None, 'retries': 0, 'timelimit': [None, None], 'root_id': 'cc41221d-88bb-4050-9ff4-82563ecae6dc', 'parent_id': None, 'argsrepr': '()', 'kwargsrepr': '{}', 'origin': 'gen162008@bioinf-galaxy.erasmusmc.nl', 'ignore_result': False, 'replaced_task_nesting': 0, 'stamped_headers': None, 'stamps': {}, 'properties': {'correlation_id': 'cc41221d-88bb-4050-9ff4-82563ecae6dc', 'reply_to': '7b208dab-a6ca-3fbd-a45d-f39e3521de2f', 'delivery_mode': 2, 'delivery_info': {'exchange': '', 'routing_key': 'galaxy.internal'}, 'priority': 0, 'body_encoding': 'base64', 'delivery_tag': '6a54628e-81e9-4ffa-81e0-7774b9889fc4'}, 'reply_to': '7b208dab-a6ca-3fbd-a45d-f39e3521de2f', 'correlation_id': 'cc41221d-88bb-4050-9ff4-82563ecae6dc', 'hostname':... kwargs:{})
galaxyctl[162059]: [2024-10-28 13:55:14,398: INFO/main] Successfully executed Celery task clean_object_store_caches to prune object store cache directories clean_object_store_caches to prune object store cache directories (0.132 ms)
galaxyctl[162059]: [2024-10-28 13:55:14,794: INFO/main] Task galaxy.clean_object_store_caches[cc41221d-88bb-4050-9ff4-82563ecae6dc] succeeded in 0.39791450649499893s: None

the tracebacks all look pretty normal:

$ journalctl -u galaxy-gunicorn --since '1 day ago' | grep Traceback -A50 | egrep '(Exception|Error)' | cut -c 44-
galaxyctl[108675]: AssertionError: File /srv/galaxy/var/tool-data/sift_db.loc specified by the 'filename' attribute not found
galaxyctl[109689]: AssertionError: File /srv/galaxy/var/tool-data/sift_db.loc specified by the 'filename' attribute not found
galaxyctl[109691]: AssertionError: File /srv/galaxy/var/tool-data/sift_db.loc specified by the 'filename' attribute not found
galaxyctl[158149]: AssertionError: File /srv/galaxy/var/tool-data/sift_db.loc specified by the 'filename' attribute not found
galaxyctl[159334]: AssertionError: File /srv/galaxy/var/tool-data/sift_db.loc specified by the 'filename' attribute not found
galaxyctl[159337]: galaxy.tool_util.toolbox.base ERROR 2024-10-28 10:55:04,276 [pN:main.2,p:159337,tN:Thread-2] Error reading tool from path: phenotype_association/sift.xml
galaxyctl[159337]: AssertionError: File /srv/galaxy/var/tool-data/sift_db.loc specified by the 'filename' attribute not found
galaxyctl[161842]: AssertionError: File /srv/galaxy/var/tool-data/sift_db.loc specified by the 'filename' attribute not found
galaxyctl[162844]: AssertionError: File /srv/galaxy/var/tool-data/sift_db.loc specified by the 'filename' attribute not found
galaxyctl[162846]: galaxy.tool_util.toolbox.base ERROR 2024-10-28 10:57:09,667 [pN:main.2,p:162846,tN:Thread-2] Error reading tool from path: phenotype_association/sift.xml
galaxyctl[162846]: AssertionError: File /srv/galaxy/var/tool-data/sift_db.loc specified by the 'filename' attribute not found
galaxyctl[162846]: AssertionError: File /srv/galaxy/var/tool-data/sift_db.loc specified by the 'filename' attribute not found
galaxyctl[162844]: AssertionError: File /srv/galaxy/var/tool-data/sift_db.loc specified by the 'filename' attribute not found
galaxyctl[162846]:     raise RequestParameterInvalidException(
galaxyctl[162846]: galaxy.exceptions.RequestParameterInvalidException: Extension 'txt,tabular' unknown, cannot use dataset collection as input
mvdbeek commented 3 weeks ago

You need to enable celery in the galaxy config (the jobs you listed are cron-style jobs), and

galaxy.exceptions.RequestParameterInvalidException: Extension 'txt,tabular' unknown, cannot use dataset collection as input

explains the second part.

hexylena commented 3 weeks ago

You need to enable celery in the galaxy config

right there's multiple celery toggles. Yes you're right I'm missing enable celery tasks.

this needs to be communicated more usefully to the user/admin, I think? e.g. showing the tab but disabling it and having a tooltip of "please enable celery tasks in your galaxy.yml to allow changing datatypes of a collection" would have potentially removed this issue completely.

explains the second part.

I'm not sure it does? that was unrelated testing on the same dataset and i triggered it by trying to extract element identifiers from that collection (which required manually dragging it into the form, hence I didn't report that) though I can see how it looks related

natefoo commented 2 weeks ago

Is there a reason not to default enable_celery_tasks at this point? The two documented install and run methods (Ansible, Gravity via run.sh or directly) make sure you have a running Celery, and there are more and more parts of Galaxy that don't function without it.

Also, SQLAlchemy can be used as a results backend, is there any reason not to have Galaxy use it as the default if you don't specify something else (e.g. redis)?

davelopez commented 2 weeks ago

Also, SQLAlchemy can be used as a results backend, is there any reason not to have Galaxy use it as the default if you don't specify something else (e.g. redis)?

Regarding this, the default is now using a simple SQLite database as the results backend https://github.com/galaxyproject/galaxy/pull/17949

I think the main concern about enabling it by default was the user rate limiting issue, but if I remember correctly it was fixed some time ago, so probably we could enable it by default at this point.

hexylena commented 2 weeks ago

the default is now using a simple SQLite database as the results backend

couldn't/shouldn't this default to using whatever the database connection is? so it could default to postgres when that's in use?

either way would be great to have this enabled by default!! (or any notification to the end user that this feature is available in galaxy but disabled due to administrator (mis)configuration)

jdavcs commented 2 weeks ago

For 25.0 we'll consider enabling Celery by default.