apache / iotdb

Apache IoTDB
https://iotdb.apache.org/
Apache License 2.0
5.61k stars 1.02k forks source link

[Bug] Docker signal handling not working #13498

Open fschulze-dtm opened 1 month ago

fschulze-dtm commented 1 month ago

Search before asking

Version

iotdb 1.3.1-standalone

Describe the bug and provide the minimal reproduce step

When stoping a docker container running the apache/iotdb:1.3.1-standalone image the SIGTERM signal handling trap is not executed leading to a non graceful shut down. This is because the entrypoint.sh script uses exec which destroys signal handlers using trap.

Furthermore, the function that should be executed at SIGTERM 'on_stop' defined in entrypoint.sh has the if statement "$start_what" != "all".` Therfore in standalone mode the corresponding graceful shutdown is not executed.

To reproduce run the docker container and then stop it.

What did you expect to see?

The on_stop function defined in entrypoint.sh is executed when the docker container is stopped providing a graceful shutdown with FLUSH.

What did you see instead?

Rapid shut down without proper SIGNAL handling and without execution of the on_stop function.

Anything else?

No response

Are you willing to submit a PR?

github-actions[bot] commented 1 month ago

Hi, this is your first issue in IoTDB project. Thanks for your report. Welcome to join the community!

CritasWang commented 1 month ago

Is there no problem with the logic here

if [[ "$start_what" != "confignode" ]]; then
        echo "###### manually flush ######";
        start-cli.sh -e "flush;" || true
        stop-datanode.sh
        echo "##### done ######";
    else
        stop-confignode.sh;
    fi
fschulze-dtm commented 1 month ago

Is there no problem with the logic here

if [[ "$start_what" != "confignode" ]]; then
        echo "###### manually flush ######";
        start-cli.sh -e "flush;" || true
        stop-datanode.sh
        echo "##### done ######";
    else
        stop-confignode.sh;
    fi

This is the code snippet from apache/iotdb:1.3.2-standalone image. In apache/iotdb:1.3.1-standalone it is

if [[ "$start_what" == "datanode" ]]; then
    echo "###### manually flush ######";
    start-cli.sh -e "flush;" || true
    echo "stopping datanode service";
    stop-datanode.sh ;
    echo "##### done ######";
elif [[ "$start_what" != "all" ]]; then
    echo "###### manually flush ######";
    start-cli.sh -e "flush;" || true
    echo "stopping confignode and datanode service";
    stop-standalone.sh ;
    echo "##### done ######";
elif [[ "$start_what" == "confignode" ]]; then
    echo "stopping confignode service";
    stop-confignode.sh;
    echo "##### done ######";
fi

Also the main problem of using exec in the entrypoint.sh which kills the trap remains.

CritasWang commented 4 weeks ago

Is there no problem with the logic here

if [[ "$start_what" != "confignode" ]]; then
        echo "###### manually flush ######";
        start-cli.sh -e "flush;" || true
        stop-datanode.sh
        echo "##### done ######";
    else
        stop-confignode.sh;
    fi

This is the code snippet from apache/iotdb:1.3.2-standalone image. In apache/iotdb:1.3.1-standalone it is

if [[ "$start_what" == "datanode" ]]; then
    echo "###### manually flush ######";
    start-cli.sh -e "flush;" || true
    echo "stopping datanode service";
    stop-datanode.sh ;
    echo "##### done ######";
elif [[ "$start_what" != "all" ]]; then
    echo "###### manually flush ######";
    start-cli.sh -e "flush;" || true
    echo "stopping confignode and datanode service";
    stop-standalone.sh ;
    echo "##### done ######";
elif [[ "$start_what" == "confignode" ]]; then
    echo "stopping confignode service";
    stop-confignode.sh;
    echo "##### done ######";
fi

Also the main problem of using exec in the entrypoint.sh which kills the trap remains.

Actually, an elegant shutdown only requires calling the stop script.

start-cli.sh -e "flush;"

This operation is just a guarantee mechanism, and after calling the stop script, the program will also perform corresponding elegant shutdown processing internally