Open dhiaayachi opened 2 weeks ago
docker-compose-multirole.yaml
IssueThis issue is about the docker-compose-multirole.yaml
example in the temporalio/docker-compose
repository. Here's a breakdown of the issue and potential solutions:
Problem:
temporal-history
service gets stuck at "Waiting for Temporal server to start..." and fails to connect to the temporal-frontend
service via Nginx. The whole cluster seems out of sync.connect() failed (111: Connection refused)
errors from Nginx and unable to health check
errors from the history service.Potential Causes:
temporal-history
service might be unable to connect to the temporal-frontend
service due to network connectivity issues or DNS resolution problems.temporal-history
service might be starting before the temporal-frontend
service is fully initialized and listening on the specified port, leading to a connection refusal.temporal-frontend
service is ready before the temporal-history
service attempts to connect.temporal-history
service.Troubleshooting Steps:
temporal-history
container and the temporal-frontend
container. Use ping
or docker exec
to run nslookup
within the containers to validate DNS resolution.docker-compose-multirole.yaml
file, adjust the service startup order using the depends_on
property to ensure the temporal-frontend
service starts before the temporal-history
service.services:
temporal-frontend:
# ...
depends_on:
- temporal-nginx
temporal-history:
# ...
depends_on:
- temporal-frontend
docker-compose up
Timeouts:
timeout
value for the docker-compose up
command to allow for more time for services to start.docker compose -f docker-compose-multirole.yaml up -t 300 # Increase the timeout to 5 minutes
docker-compose-multirole.yaml
file for any errors in the Nginx configuration. Check the upstream
directive in the temporal
server block to ensure it is correctly configured and pointing to the temporal-frontend
service.development.yaml
file) to ensure there are no delays in the server's startup process. If necessary, adjust values such as the maxJoinDuration
or rpcAddress
to improve connectivity and reduce startup delays.grpc-health-probe
:
grpc-health-probe
tool from the grpc-ecosystem to check if the frontend service is healthy and reachable by the history service.Additional Tips:
temporal-frontend
, temporal-history
, and Nginx services to gain more insights into the issue.Conclusion:
This issue appears to be related to a combination of network connectivity, race conditions, and potentially service startup order issues. By carefully reviewing the configuration files and adjusting the service startup order, you should be able to mitigate this intermittent problem. If the issue persists, reach out to the Temporal community or support team for assistance.
Thanks for reporting this issue! The logs show that temporal-history
is unable to health check the temporal.api.workflowservice.v1.WorkflowService
service. This is most likely due to the temporal-frontend
service not being available at the expected address, causing the connection to be refused.
The logs show the Frontend Service is running on 192.168.16.8:7237
and 192.168.16.9:7236
while the History service is trying to reach the Frontend Service on 192.168.16.8:7237
and 192.168.16.9:7236
.
You may need to review your docker-compose-multirole.yaml
to ensure these addresses are properly configured.
To help debug further, could you please tell me:
docker-compose-multirole.yaml
file?temporal-frontend
, temporal-frontend2
, and temporal-history
?Once I have this information I can provide a more tailored solution for your issue.
Thank you for reporting this issue. We are aware that the docker-compose-multirole.yaml
example sometimes experiences a race condition.
The issue occurs when Temporal History service is unable to reach the Temporal Frontend Service via Nginx. This can happen due to a race condition in the startup process.
Here are some troubleshooting steps you can try:
worker_connections
setting in Nginx's configuration to increase the number of connections allowed.If these steps don't resolve the issue, please provide more information about your environment, including:
We can then provide more specific guidance.
For further reference, please refer to Temporal's docker-compose documentation.
Thanks for reporting the issue.
It seems like the issue is due to the temporal-history service failing to connect to the frontend service. This could be caused by a network issue or a problem with the docker-compose configuration.
Could you try running the docker-compose command with the --verbose
flag to get more detailed logs?
In the meantime, you could try the following workarounds:
If none of these workarounds solve the issue, please provide the following information:
This information will help us understand the issue and find a solution.
Thanks for reporting this issue! It appears that you are experiencing a problem where the temporal-history
service gets stuck when you start your multirole cluster using the example in the temporalio/docker-compose
repo.
Could you tell me what version of Temporal Server you are using? I noticed you are using Temporal CLI version 1.22.4.
Also, have you tried running the docker-compose-multirole.yaml
example with a different database backend like MySQL or Cassandra?
Here are some additional tips:
temporal-history
container and the temporal-frontend
container is not blocked by any firewall rules or network configuration.temporal-frontend
and temporal-nginx
containers for any errors or warnings that might be related to the issue.temporal-frontend
and temporal-nginx
containers.If the issue is still not resolved, please provide the following information to help us better understand the problem:
temporal-history
, temporal-frontend
, and temporal-nginx
containers.docker compose -f docker-compose-multirole.yaml up
command.Let me know if you have any more questions.
The issue is about docker-compose-multirole.yaml example in temporalio/docker-compose repo. I am posting this here because temporalio/docker-compose does not have an issue page.
Expected Behavior
When I run
docker compose -f docker-compose-multirole.yaml up
the whole multirole cluster is up and running normally.Actual Behavior
When I run
docker compose -f docker-compose-multirole.yaml up
, temporal-history is sometimes stuck at "Waiting for Temporal server to start...", unable to reach the frontend service via nginx. The whole cluster does not seem to be in sync because I cannot connect to the UI service as well.This does not always happen. So you need to try this multiple times. To me it feels like the chance is about 20%, especially when you stop docker-compose after running the service for a long time.
Restarting temporal-frontend or temporal-frontend2 sometimes make the cluster to the normal state, but not always.
Steps to Reproduce the Problem
git clone https://github.com/temporalio/docker-compose
cd docker-compose
docker plugin install grafana/loki-docker-driver:latest --alias loki --grant-all-permissions
docker compose -f docker-compose-multirole.yaml up
Specifications
Here is the following log:
Setting
TEMPORAL_CLI_SHOW_STACKS=true
on temporal-history does not help much: