SlideRuleEarth / sliderule-prov-sys

Provisioning System for Slide Rule clusters
BSD 3-Clause "New" or "Revised" License
2 stars 0 forks source link

Executing an API call for desired num nodes at the same time as a Web update causes the command queue to hang #40

Closed cugarteblair closed 1 year ago

cugarteblair commented 1 year ago

Executing an API call for desired num nodes at the same time as a Web update causes the command queue to hang

cugarteblair commented 1 year ago

It turns out somehow the command queue was deleted. One thing to note was that was that first attempt to create the organization failed due to a bug in the ps_server where it was using the legacy bucket to create a workspace with terraform files (issue ICESat2-SlideRule/sliderule-prov-sys#41)

Here is the log: log_for_utexas_problem.log

Here is the celery queue dump:

image

Here is the redis queue dump:

image

Notice the anomalous entry for utexas in redis and that it is missing altogether in celery?

cugarteblair commented 1 year ago

After some analysis I believe this was caused by the error in ICESat2-SlideRule/sliderule-prov-sys#41. When the exception occurred in the create org processing the create_queue procedure was not called