Closed amavinag closed 4 years ago
I've seen this before, but was always unable to pin it down. My current theory is that the way systemd is ordering the jobs may be not working correctly in some cases.
The next time this comes up run systemctl list-jobs
on the appliance and put the output here.
That will tell us if there are any current systemd unit tasks running (if I'm right I would expect the evmserverd start task to be shown as queued there).
/cc @simaishi
@carbonin: I see this problem again with "miq-nightly-201608152000". no jobs running in "systemctl list-jobs"
[root@~]# systemctl list-jobs No jobs running.
[root@ ~]# systemctl | grep evm evm-watchdog.service loaded active running evm watchdog process
[root@ ~]# tail -f /var/www/miq/vmdb/log/evm.log [----] I, [2016-08-16T11:38:00.631226 #8511:b6597c] INFO -- : MIQ(SessionStore) Using session_store: ActionDispatch::Session::MemCacheStore [----] I, [2016-08-16T11:38:01.230512 #8511:b6597c] INFO -- : MIQ(Vmdb::Initializer.init) - Program Name: bin/rails, PID: 8511, ENV['MIQ_GUID']: , ENV['EVMSERVER']: [----] I, [2016-08-16T11:38:10.832402 #10473:12c3980] INFO -- : MIQ(Vmdb::Loggers.apply_config) Log level for vim.log has been changed to [WARN] [----] I, [2016-08-16T11:38:11.495127 #10473:12c3980] INFO -- : MIQ(SessionStore) Using session_store: ActionDispatch::Session::MemCacheStore [----] I, [2016-08-16T11:38:11.666478 #10473:12c3980] INFO -- : MIQ(Vmdb::Initializer.init) - Program Name: /opt/rubies/ruby-2.2.5/bin/rake, PID: 10473, ENV['MIQ_GUID']: , ENV['EVMSERVER']: [----] I, [2016-08-16T11:38:41.942822 #11927:c5b980] INFO -- : MIQ(Vmdb::Loggers.apply_config) Log level for vim.log has been changed to [WARN] [----] I, [2016-08-16T11:38:42.605064 #11927:c5b980] INFO -- : MIQ(SessionStore) Using session_store: ActionDispatch::Session::MemCacheStore [----] I, [2016-08-16T11:38:42.759884 #11927:c5b980] INFO -- : MIQ(Vmdb::Initializer.init) - Program Name: bin/rails, PID: 11927, ENV['MIQ_GUID']: , ENV['EVMSERVER']: [----] I, [2016-08-16T11:38:44.737053 #11927:c5b980] INFO -- : MIQ(MiqDatabase.seed) Creating MiqDatabase record [----] I, [2016-08-16T11:38:44.945779 #11927:c5b980] INFO -- : MIQ(MiqRegion.seed) Creating Region [0]
we are observing this issue more often in recent nightly builds. is there any update on this issue? /cc @simaishi @dajohnso
I opened a PR that will hopefully give us some additional information around the spot where we start and enable the evmserverd
service.
If this happens again, check the appliance_console.log
. It may have more detail.
I just hit this issue and I don't see anything different in appliance_console.log
...
@carbonin other people seem to be hitting this issue. Since we're not sure what's causing this to happen, is it possible to add a retry or something if it's enabled but not started?
Encountered the issue again and seeing a bit more details in appliance_console.log
this time...
E, [2017-06-15T12:16:04.881923 #11972] ERROR -- : Failed to enable and start evmserverd service: /bin/systemctl exit code:
E, [2017-06-15T12:16:04.882098 #11972] ERROR -- : /opt/rubies/ruby-2.3.1/lib/ruby/gems/2.3.0/gems/awesome_spawn-1.4.1/lib/awesome_spawn.rb:105:in `run!'
/opt/rubies/ruby-2.3.1/lib/ruby/gems/2.3.0/gems/linux_admin-0.20.1/lib/linux_admin/common.rb:24:in `run!'
/opt/rubies/ruby-2.3.1/lib/ruby/gems/2.3.0/gems/linux_admin-0.20.1/lib/linux_admin/service/systemd_service.rb:18:in `start'
/opt/rubies/ruby-2.3.1/lib/ruby/gems/2.3.0/bundler/gems/manageiq-gems-pending-22f8dd86d6e8/lib/gems/pending/appliance_console/database_configuration.rb:234:in `block in start_evm'
/opt/rubies/ruby-2.3.1/lib/ruby/gems/2.3.0/bundler/gems/manageiq-gems-pending-22f8dd86d6e8/lib/gems/pending/appliance_console/database_configuration.rb:232:in `fork'
/opt/rubies/ruby-2.3.1/lib/ruby/gems/2.3.0/bundler/gems/manageiq-gems-pending-22f8dd86d6e8/lib/gems/pending/appliance_console/database_configuration.rb:232:in `start_evm'
/opt/rubies/ruby-2.3.1/lib/ruby/gems/2.3.0/bundler/gems/manageiq-gems-pending-22f8dd86d6e8/lib/gems/pending/appliance_console/internal_database_configuration.rb:101:in `post_activation'
/opt/rubies/ruby-2.3.1/lib/ruby/gems/2.3.0/bundler/gems/manageiq-gems-pending-22f8dd86d6e8/lib/gems/pending/appliance_console/cli.rb:192:in `set_internal_db'
/opt/rubies/ruby-2.3.1/lib/ruby/gems/2.3.0/bundler/gems/manageiq-gems-pending-22f8dd86d6e8/lib/gems/pending/appliance_console/cli.rb:165:in `set_db'
/opt/rubies/ruby-2.3.1/lib/ruby/gems/2.3.0/bundler/gems/manageiq-gems-pending-22f8dd86d6e8/lib/gems/pending/appliance_console/cli.rb:148:in `run'
/opt/rubies/ruby-2.3.1/lib/ruby/gems/2.3.0/bundler/gems/manageiq-gems-pending-22f8dd86d6e8/lib/gems/pending/appliance_console/cli.rb:311:in `parse'
/usr/bin/appliance_console_cli:11:in `<main>'
This issue has been automatically marked as stale because it has not been updated for at least 6 months.
If you can still reproduce this issue on the current release or on master
, please reply with all of the information you have about it in order to keep the issue open.
Thank you for all your contributions!
@carbonin This is still happening... anything else we can do to troubleshoot?
I haven't had time to look into this, but I've also seen it. I added the pinned
label so it won't get marked as stale again.
In the times I've seen it, I've still been able to just start up evmserverd
without trouble, but didn't see anything that could have caused it. It really seems like a timing issue.
@carbonin @simaishi Happening to me consistently, installing on RHV using Ovirt.Manageiq role. So far 5 out of 5 failed, I'm able to start it manually right afterwards. Let me know if I can help on debugging or feedback. ApplianceConsole is taking ~25 seconds to initialize, that should be good enough to allow things to settle, the failure comes 7 seconds later but without an exit code from systemctl it's hard to say why it failed.
I, [2018-12-06T09:59:07.372162 #4849] INFO -- : MIQ(ManageIQ::ApplianceConsole::InternalDatabaseConfiguration#initialize_postgresql) : starting
I, [2018-12-06T09:59:19.736010 #4849] INFO -- : MIQ(ManageIQ::ApplianceConsole::InternalDatabaseConfiguration#initialize_postgresql) : complete
I, [2018-12-06T09:59:32.833896 #4849] INFO -- : MIQ(ManageIQ::ApplianceConsole::InternalDatabaseConfiguration#create_region) : starting
I, [2018-12-06T10:00:03.476615 #4849] INFO -- : MIQ(ManageIQ::ApplianceConsole::InternalDatabaseConfiguration#create_region) : complete
E, [2018-12-06T10:00:10.163587 #15587] ERROR -- : Failed to enable and start evmserverd service: /bin/systemctl exit code:
E, [2018-12-06T10:00:10.163879 #15587] ERROR -- : /usr/local/lib/ruby/gems/2.4.0/gems/awesome_spawn-1.4.1/lib/awesome_spawn.rb:105:in `run!'
/usr/local/lib/ruby/gems/2.4.0/gems/linux_admin-1.2.2/lib/linux_admin/common.rb:24:in `run!'
/usr/local/lib/ruby/gems/2.4.0/gems/linux_admin-1.2.2/lib/linux_admin/service/systemd_service.rb:18:in `start'
/usr/local/lib/ruby/gems/2.4.0/gems/manageiq-appliance_console-3.3.0/lib/manageiq/appliance_console/database_configuration.rb:237:in `block in start_evm'
/usr/local/lib/ruby/gems/2.4.0/gems/manageiq-appliance_console-3.3.0/lib/manageiq/appliance_console/database_configuration.rb:235:in `fork'
/usr/local/lib/ruby/gems/2.4.0/gems/manageiq-appliance_console-3.3.0/lib/manageiq/appliance_console/database_configuration.rb:235:in `start_evm'
/usr/local/lib/ruby/gems/2.4.0/gems/manageiq-appliance_console-3.3.0/lib/manageiq/appliance_console/internal_database_configuration.rb:108:in `post_activation'
/usr/local/lib/ruby/gems/2.4.0/gems/manageiq-appliance_console-3.3.0/lib/manageiq/appliance_console/cli.rb:225:in `set_internal_db'
/usr/local/lib/ruby/gems/2.4.0/gems/manageiq-appliance_console-3.3.0/lib/manageiq/appliance_console/cli.rb:192:in `set_db'
/usr/local/lib/ruby/gems/2.4.0/gems/manageiq-appliance_console-3.3.0/lib/manageiq/appliance_console/cli.rb:170:in `run'
/usr/local/lib/ruby/gems/2.4.0/gems/manageiq-appliance_console-3.3.0/lib/manageiq/appliance_console/cli.rb:412:in `parse'
/usr/local/lib/ruby/gems/2.4.0/gems/manageiq-appliance_console-3.3.0/bin/appliance_console_cli:7:in `<top (required)>'
/usr/local/bin/appliance_console_cli:23:in `load'
/usr/local/bin/appliance_console_cli:23:in `<main>'
I can confirm this issue persists with downstream CF builds as well. Upon every reboot I must start EVM server.
That is very different @bthurber.
In this issue the service is getting enabled, but not started on first boot. This is the expected behavior for ManageIQ, but not for CloudForms.
In CloudForms the user is expected to configure a database (either create or join a region) for each new appliance, but after that the service is expected to be started and enabled. It sounds like your service was started on first configuration, but not enabled which is the opposite of this issue, where the service is enabled, but not started.
@gtanzillo We continue seeing this problem, is there anything we can do?
Sometime I see Failed to enable and start evmserverd service: /bin/systemctl exit code:
(without an actual exit code) in appliance_console.log and other times I don't even see that message. In both cases I can manually start evmserverd and things work after that.
@simaishi does this happen with the systemd
integration that @agrare recently implemented?
Anything is possible but most likely not, the systemd work went in in May '19 most of these reports are from before that and the systemd ruby gems aren't installed by default.
I'm not sure exactly what changes with the systemd work that was done... Would it change how we start evmserverd? Based on the logs I saw, it looked like evmserverd
service was enabled but not started at all.
@simaishi short summary, we can configurably use systemd to start and manage miq_worker processes rather than miq_server/worker_management. There is a config setting for this and the main evmserverd.service file is not impacted
@carbonin I was able to reproduce this issue using ivanchuk-1 vagrant image and can confirm calling systemctl enable
with --now
option in LinuxAdmin fixes the issue (rather than calling enable.start
).
Can you please take a look and fix this properly? (My "fix" works only for testing as I just duplicated enable
method and added --now
option 😄)
Thanks to the chat we had this morning... that gave me a hint.
I deployed an appliance which has manageiq-appliance_console 5.3.2. For the first time, I actually see a message printed for the exit code:
E, [2020-03-10T08:44:03.055517 #6779] ERROR -- : Failed to enable and start evmserverd service: /bin/systemctl exit code: error was: Created symlink /etc/systemd/system/multi-user.target.wants/evmserverd.service → /usr/lib/systemd/system/evmserverd.service.
evmserverd started fine, though. Looks like that message was treated as error and that was preventing start
to be called after enable
before, but since enable and start are done in 1 command now, it works?
@simaishi Is the service enabled?
Yes, enabled and started.
Adding --quiet
to systemctl command might get rid of that error?
I wonder if it's writing that output to standard output or exiting non-zero.
Sorry, standard error
[intermittent]: evm service failed to auto start upon appliance boot. however, "evmserverd" starts running without problem if started manually. this issue is seen intermittently on virtualcenter and rhevm.
version: miq-nightly-201608052000 anaconda-post log: https://gist.github.com/amavinag/a7a008820188398be13db7366db1d7bd evmlog: https://gist.github.com/amavinag/149c4337e4d64af8096ef2c346ca72b4