Closed monaka closed 8 years ago
Could not run the following programs, are they installed? psql
Are you running v2.0.0? That error indicates psql
is not installed in your container.
@bacongobbler Yes, v2.0.0.
It was fixed after deleted by using kubectl delete po deis-database ...
.
So the container image has psql.
It's not impossible to delete psql in the instance by someone... but hard to image.
I thought it was occurred by memory exhaust. But the result of free
on the node seems be healthy...
total used free shared buffers cached
Mem: 7139592 7067840 71752 8384 148344 4530472
-/+ buffers/cache: 2389024 4750568
Swap: 8388604 367704 8020900
okay so if you're on v2.0.0 then the other reason this error would pop up is if wal-e could not get a connection to the database, as it says in the logs. Since the previous logs say
psql: FATAL: the database system is starting up
I would assume that is your issue, and that the database took an abnormally long time to boot. Once it was restarted it restored faster (likely connection issues to Azure?).
I've got a work-in-progress that removes the wait timeout, which is the likely cause for this issue. https://github.com/deis/postgres/pull/112
I see. I'll try the canary build after #112 is merged. And also trying more information when this issue was reproduced.
BTW, some persons using Kube on Azure may have DNS related issues.
It seems be reasonable if my issues I posted recently were DNS on Azure specific. Tests (light weight usages) may be passed but production usages may be failed.
I tried to #112 based builds and it seems to resolve this issue. Even though CI tests are failed randomly, it works well in my Workflow.
(I know PR #112 is WIP and it is going to fix this issue near the future.)~~~
I had some confusions about my canary images. Let me revert this comment.
But still I'm in trouble around there and be inspecting...
Even though I don't have no certain evidence, I guess it is caused when there was executed Wal-E and psql at the same time. (Maybe, psql inside Wal-E and psql outside WAL-E)
Deis/database container runs psql periodically. I think there is no need to run psql under the recover mode. My guess is reasonable?
Additional information:
Recovery failures with SIGQUIT may be decreased by upgrading the spec of the node running SkyDNS (not the node running deis/database). In my case, Azure D2_V2 (2core / 7GB RAM) to F4 (4core /8GB RAM).
And, upgrading specs seems not 'silver blullets'. Because still I have randomly termination by signal 3 from WAL-E. But a step forward.
(BTW, I'm curious. Referencing to official document, it is enough to run Deis Workflow by preparing 2 core * 2 nodes. But my cluster requires more specs. My nodes is off production state, just a few sample apps running. Why does the cluster require more power...)
At least on my Deis Workflow, this issue was resolved by #112. I guess this can be closed after #112 is merged.
this should be resolved via #137. If it isn't please re-open the issue at wal-e/wal-e. Thanks!
The deis-database on my cluster was in rebooting loops. I'm not sure the reason why for now because It seems be started in my sleeping time.