dhiaayachi / temporal

Temporal service
https://docs.temporal.io
MIT License
0 stars 0 forks source link

bug: concurrent map read and map write #439

Open dhiaayachi opened 2 months ago

dhiaayachi commented 2 months ago

Expected Behavior

I ran into a weird error in a test run, just using the temporal auto-setup docker image version 1.19.1. I don't think I'm doing anything special. Just starting some basic containers.

Actual Behavior

The following logs show how temporal failed to start https://gist.github.com/vikstrous2/7d016b5562903b723d93b6a403589620

Steps to Reproduce the Problem

Start temporal from this docker-compose file over and over again until this error triggers:

version: '3.4'
services:
  temporal-db:
    image: postgres:9.6.24-alpine@sha256:8342bcb43446694428ec6594e72e4299692854f0fc3aca090b0ab46f4c7f32a1
    restart: unless-stopped
    environment:
      POSTGRES_PASSWORD: temporal
      POSTGRES_USER: temporal
    ports:
      - 5434:5432
    healthcheck:
      interval: 1000h
      test: 'true'
  temporal:
    image: temporalio/auto-setup:1.19.1@sha256:3b582c47c354e7f9958c098f168ceb514766ab93526e9be1d772179663710d0f
    restart: unless-stopped
    depends_on:
      - temporal-db
    environment:
      - DB=postgresql
      - DB_PORT=5432
      - POSTGRES_USER=temporal
      - POSTGRES_PWD=temporal
      - POSTGRES_SEEDS=temporal-db
    ports:
      - 7233:7233
    healthcheck:
      interval: 1000h
      test: 'true'

Specifications

dhiaayachi commented 1 month ago

Thank you for reporting this issue. It looks like the issue you are experiencing is the Resource Exhausted Cause Concurrent Limit error.

The Docker image you are using is running all of the Temporal Server's services in a single process. In this setup, you might be exceeding the MaxConcurrentActivityExecutionSize Worker option, which is set to a default value of 1000. This means that the process cannot support more than 1,000 simultaneous Activity Task Executions at once.

If the number of pending Activities exceeds the MaxConcurrentActivityExecutionSize limit, the Temporal Server will fail the next Workflow Task with a Resource Exhausted Cause Concurrent Limit error.

To resolve this issue, you should increase the MaxConcurrentActivityExecutionSize value. You can increase this by customizing your Worker options in your Temporal Go SDK. See How to set WorkerOptions in Go for a full list of options.

For more information about customizing Docker images and the available WorkerOptions, see the Temporal Docker Builds repository.

You might want to increase MaxConcurrentActivityExecutionSize to at least the same value as your Poller Count.

Another alternative is to change the configuration of your Temporal Service so that each service runs as a separate Docker container. This enables you to scale the services independently, giving you better control over how much resources each service uses. See the temporalio/docker-compose repo for an example on how to do this. This can be complex, so for production deployments you might want to consider Temporal Cloud.

dhiaayachi commented 1 month ago
Thanks for reporting this issue!

The logs indicate that the Temporal Server is having trouble connecting to the PostgreSQL database. The most likely cause is a mismatch between the Temporal Server version and the PostgreSQL version.

To resolve this issue, try the following:

1. **Upgrade your PostgreSQL to a supported version.** Check the [Temporal Server release notes](https://github.com/temporalio/temporal/releases) for the latest supported PostgreSQL versions for your Temporal Server version.
2. **Update your database schema.** Use the `temporal-sql-tool` to update your PostgreSQL schema for the new version. For details, see the [Upgrade Server](https://docs.temporal.io/self-hosted-guide/upgrade-server#upgrade-server) section.

If you're still experiencing the issue, please provide the following information:

- The exact versions of the Temporal Server and PostgreSQL you are using.
- The output of the `temporal server start-dev --help` command.
- Any error messages you see in the Temporal Server logs.

This will help us troubleshoot the issue further.
dhiaayachi commented 1 month ago

Thanks for reporting this issue!

It looks like the error you are seeing is caused by a known bug in Temporal v1.19.1 that was resolved in v1.19.2.

Please upgrade to the latest version to resolve the issue: 
* [https://github.com/temporalio/temporal/releases/tag/v1.19.2](https://github.com/temporalio/temporal/releases/tag/v1.19.2)