Closed barryprice closed 6 months ago
It seems like the behaviour must be logged somewhere (and if it isn't perhaps there's a logging level we can adjust). There's juju debug-log
, the workload container logs, and logs in /srv/discourse/app/log
in the discourse container as well. If we can get an exact time/instance where this happens and can look at the logs relatively quickly (before they rotate out or the pod is restarted again) we should have enough info to figure out what's happening.
Saw this in the logs:
unit-discourse-k8s-1: 2024-04-16 03:59:43 ERROR unit.discourse-k8s/1.juju-log S3 migration failed with code 1.
Traceback (most recent call last):
File "./src/charm.py", line 606, in _run_s3_migration
process.wait_output()
File "/var/lib/juju/agents/unit-discourse-k8s-1/charm/venv/ops/pebble.py", line 1441, in wait_output
raise ExecError[AnyStr](self._command, exit_code, out_value, err_value)
ops.pebble.ExecError: non-zero exit code 1 executing ['/srv/discourse/app/bin/bundle', 'exec', 'rake', 's3:upload_assets'], stdout='gem install rrule -v 0.4.4 -i /srv/discourse/app/plugins/discourse-calendar/gems/3.2.2 --no-document --ignore-dependencies
--no-user-install\nSuccessfully installed rrule-0.4.4\n1 gem installed\ngem install webrick -v 1.7.0 -i /srv/discourse/app/plugins/discourse-prometheus/gems/3.2.2 --no-document --ignore-dependencies --no-user-install\nSuccessfully installed webrick-1.7.
0\n1 gem installed\ngem install prometheus_exporter -v 2.0.6 -i /srv/discourse/app/plugins/discourse-prometheus/gems/3.2.2 --no-document --ignore-dependencies --no-user-install\nprometheus_exporter will only bind to localhost by default as of v0.5\nSucce
ssfully installed prometheus_exporter-2.0.6\n1 gem installed\ngem install macaddr -v 1.0.0 -i /srv/discourse/app/plugins/discourse-saml/gems/3.2.2 --no-document --ignore-dependencies --no-user-install\nSuccessfully installed macaddr-1.0.0\n1 gem installe
d\ngem install uuid -v 2.3.7 -i /srv/discourse/app/plugins/discourse-saml/gems/3.2.2 --no-document --ignore-dependencies --no-user-install\nSuccessfully i' [truncated], stderr="Couldn't connect to Redis\nrake aborted!\nRedis::CannotConnectError: Error co
nnecting to Redis on 192.168.101.92:6379 (Redis::TimeoutError)\n/srv/discourse/app/vendor/bundle/ruby/3.2.0/gems/redis-4.8.1/lib/redis/client.rb:398:in `rescue in establish_connection'\n/srv/discourse/app/vendor/bundle/ruby/3.2.0/gems/redis-4.8.1/lib/red
is/client.rb:379:in `establish_connection'\n/srv/discourse/app/vendor/bundle/ruby/3.2.0/gems/redis-4.8.1/lib/redis/client.rb:115:in `block in connect'\n/srv/discourse/app/vendor/bundle/ruby/3.2.0/gems/redis-4.8.1/lib/redis/client.rb:344:in `with_reconnec
t'\n/srv/discourse/app/vendor/bundle/ruby/3.2.0/gems/redis-4.8.1/lib/redis/client.rb:114:in `connect'\n/srv/discourse/app/vendor/bundle/ruby/3.2.0/gems/redis-4.8.1/lib/redis/client.rb:409:in `ensure_connected'\n/srv/discourse/app/vendor/bundle/ruby/3.2.0
/gems/redis-4.8.1/lib/redis/client.rb:269:in `block in process'\n/srv/discourse/app/vendor/bundle/ruby/3.2.0/gems/redis-4.8.1/lib/redis/client.rb:356:in `logging'\n/srv/discourse/app/vendor/bundle" [truncated]
unit-discourse-k8s-1: 2024-04-16 03:59:43 ERROR unit.discourse-k8s/1.juju-log Uncaught exception while in charm code:
Traceback (most recent call last):
File "./src/charm.py", line 791, in <module>
main(DiscourseCharm, use_juju_for_storage=True)
File "/var/lib/juju/agents/unit-discourse-k8s-1/charm/venv/ops/main.py", line 436, in main
_emit_charm_event(charm, dispatcher.event_name)
File "/var/lib/juju/agents/unit-discourse-k8s-1/charm/venv/ops/main.py", line 144, in _emit_charm_event
event_to_emit.emit(*args, **kwargs)
File "/var/lib/juju/agents/unit-discourse-k8s-1/charm/venv/ops/framework.py", line 351, in emit
framework._emit(event)
File "/var/lib/juju/agents/unit-discourse-k8s-1/charm/venv/ops/framework.py", line 853, in _emit
self._reemit(event_path)
File "/var/lib/juju/agents/unit-discourse-k8s-1/charm/venv/ops/framework.py", line 942, in _reemit
custom_handler(event)
File "./src/charm.py", line 194, in _on_config_changed
self._configure_pod()
File "./src/charm.py", line 673, in _configure_pod
self._run_s3_migration()
File "./src/charm.py", line 606, in _run_s3_migration
process.wait_output()
File "/var/lib/juju/agents/unit-discourse-k8s-1/charm/venv/ops/pebble.py", line 1441, in wait_output
raise ExecError[AnyStr](self._command, exit_code, out_value, err_value)
ops.pebble.ExecError: non-zero exit code 1 executing ['/srv/discourse/app/bin/bundle', 'exec', 'rake', 's3:upload_assets'], stdout='gem install rrule -v 0.4.4 -i /srv/discourse/app/plugins/discourse-calendar/gems/3.2.2 --no-document --ignore-dependencies
--no-user-install\nSuccessfully installed rrule-0.4.4\n1 gem installed\ngem install webrick -v 1.7.0 -i /srv/discourse/app/plugins/discourse-prometheus/gems/3.2.2 --no-document --ignore-dependencies --no-user-install\nSuccessfully installed webrick-1.7.
0\n1 gem installed\ngem install prometheus_exporter -v 2.0.6 -i /srv/discourse/app/plugins/discourse-prometheus/gems/3.2.2 --no-document --ignore-dependencies --no-user-install\nprometheus_exporter will only bind to localhost by default as of v0.5\nSucce
ssfully installed prometheus_exporter-2.0.6\n1 gem installed\ngem install macaddr -v 1.0.0 -i /srv/discourse/app/plugins/discourse-saml/gems/3.2.2 --no-document --ignore-dependencies --no-user-install\nSuccessfully installed macaddr-1.0.0\n1 gem installe
d\ngem install uuid -v 2.3.7 -i /srv/discourse/app/plugins/discourse-saml/gems/3.2.2 --no-document --ignore-dependencies --no-user-install\nSuccessfully i' [truncated], stderr="Couldn't connect to Redis\nrake aborted!\nRedis::CannotConnectError: Error co
nnecting to Redis on 192.168.101.92:6379 (Redis::TimeoutError)\n/srv/discourse/app/vendor/bundle/ruby/3.2.0/gems/redis-4.8.1/lib/redis/client.rb:398:in `rescue in establish_connection'\n/srv/discourse/app/vendor/bundle/ruby/3.2.0/gems/redis-4.8.1/lib/red
is/client.rb:379:in `establish_connection'\n/srv/discourse/app/vendor/bundle/ruby/3.2.0/gems/redis-4.8.1/lib/redis/client.rb:115:in `block in connect'\n/srv/discourse/app/vendor/bundle/ruby/3.2.0/gems/redis-4.8.1/lib/redis/client.rb:344:in `with_reconnec
t'\n/srv/discourse/app/vendor/bundle/ruby/3.2.0/gems/redis-4.8.1/lib/redis/client.rb:114:in `connect'\n/srv/discourse/app/vendor/bundle/ruby/3.2.0/gems/redis-4.8.1/lib/redis/client.rb:409:in `ensure_connected'\n/srv/discourse/app/vendor/bundle/ruby/3.2.0
/gems/redis-4.8.1/lib/redis/client.rb:269:in `block in process'\n/srv/discourse/app/vendor/bundle/ruby/3.2.0/gems/redis-4.8.1/lib/redis/client.rb:356:in `logging'\n/srv/discourse/app/vendor/bundle" [truncated]
unit-discourse-k8s-1: 2024-04-16 03:59:43 ERROR juju.worker.uniter.operation hook "config-changed" (via hook dispatching script: dispatch) failed: exit status 1
This should be fixed by now, as not using the stored state, if the pod restarts it should upload the assets correctly. Besides, the assets are now precompiled in the rock, so they will not change if the image does not change.
Bug Description
It's possible for Discourse to fail its S3 upload_assets routine after a restart, resulting in errors like the one above.
The HTML served from the pods loads fine, but the e.g. Javascript URLs referenced within are for paths expected to be created by the upload_assets routine, but which were never uploaded to S3.
This results in a broken page with a spinner, the main content never appears.
To Reproduce
Deploy the application with s3_enabled=True and all associated config set.
Wait for (or force) a restart.
Be unlucky enough to experience this bug (it's unclear how/why it happened).
Environment
This is running on an Openstack cloud, with S3 integration enabled.
Relevant log output
Additional context
No response