galaxyproject / galaxy

Data intensive science for everyone.
https://galaxyproject.org
Other
1.4k stars 1.01k forks source link

Infinite loop while installing data managers #14577

Open innovate-invent opened 2 years ago

innovate-invent commented 2 years ago

https://github.com/galaxyproject/galaxy/blob/70b2ea429c6231a87eaec7f6e8a3bc295fd1e21e/lib/galaxy/tool_shed/galaxy_install/tools/data_manager.py#L167-L168

This loop never exits as the operands are always 0.

innovate-invent commented 2 years ago

Digging around, it appears that if watch_core_config is false, this loop will never exit. It is disabled by default: https://github.com/galaxyproject/galaxy/blob/70b2ea429c6231a87eaec7f6e8a3bc295fd1e21e/lib/galaxy/config/sample/galaxy.yml.sample#L495

Edit: This somewhat helped, but looking back in the logs before enabling watch_core_config it was calling reload_data_managers:

galaxy.queue_worker INFO 2022-09-06 19:20:19,897 [pN:main,p:1,tN:Thread-4] Sending reload_data_managers control task.
galaxy.queue_worker INFO 2022-09-06 19:20:20,848 [pN:main,p:1,tN:Thread-1] Instance 'main' received 'reload_data_managers' task, executing now.
...
galaxy.queue_worker DEBUG 2022-09-06 19:20:20,885 [pN:main,p:1,tN:Thread-1] Data managers reloaded (36.079 ms)

Now the install sometimes succeeds.

I wonder if something is not being properly de-referenced in the waiting code when this occurs: https://github.com/galaxyproject/galaxy/blob/70b2ea429c6231a87eaec7f6e8a3bc295fd1e21e/lib/galaxy/queue_worker.py#L197-L199

I logged id(self.app.data_managers) in the loop and did not see the id change. I did notice that Data managers reloaded was printed before the id started logging. This tells me that there is a race between reload_data_managers and the loop where reload_data_managers finishes before it hits https://github.com/galaxyproject/galaxy/blob/70b2ea429c6231a87eaec7f6e8a3bc295fd1e21e/lib/galaxy/tool_shed/galaxy_install/tools/data_manager.py#L166

innovate-invent commented 2 years ago

For reference, @mvdbeek creating a working fix:

diff --git a/lib/galaxy/tool_shed/galaxy_install/tools/data_manager.py b/lib/galaxy/tool_shed/galaxy_install/tools/data_manager.py
index 466a274337..0decccbeb1 100644
--- a/lib/galaxy/tool_shed/galaxy_install/tools/data_manager.py
+++ b/lib/galaxy/tool_shed/galaxy_install/tools/data_manager.py
@@ -162,10 +162,8 @@ class DataManagerHandler:
                 data_manager_config_has_changes = True
             # Persist the altered shed_data_manager_config file.
             if data_manager_config_has_changes:
-                reload_count = self.app.data_managers._reload_count
                 self.data_manager_config_elems_to_xml_file(config_elems, shed_data_manager_conf_filename)
-                while self.app.data_managers._reload_count <= reload_count:
-                    time.sleep(0.1)  # Wait for shed_data_manager watcher thread to pick up changes
+                self.app.queue_worker.send_control_task("reload_data_managers", get_response=True)
         return rval

     def remove_from_data_manager(self, repository):