apache / shardingsphere-elasticjob

Distributed scheduled job
Apache License 2.0
8.15k stars 3.29k forks source link

ElasticJob cannot be restored when DNS service is down! #1934

Open sdlzhd opened 3 years ago

sdlzhd commented 3 years ago

Bug Report

Which version of ElasticJob did you use?

2.1.5

Which project did you use? ElasticJob-Lite or ElasticJob-Cloud?

ElasticJob-Lite

Expected behavior

ElasticJob is available after DNS to be restored

Actual behavior

ElasticJob is down

Reason analyze (If you can)

If I use ip to connect to zkp, a NodeExistsException will be thrown when the network is disconnected. And ElasticJob will ignore the NodeExistsException in RegExceptionHandler class.

But if I use domain to connect to zkp, When the network is restored, but the DNS server has not been restored, an exception IllegalArgumentException(A HostProvider may not be empty) will be thrown. And 'RegExceptionHandler' will throw the RegException(IllegalArgumentException). ElasticJob can no longer resume.

Steps to reproduce the behavior.

use ip

<!-- Ignore other parameters -->
<reg:zookeeper id="regCenter" server-lists="192.168.1.100" />
  1. start application
  2. disable network
  3. resume application

use domain

hosts 192.168.1.100 zkp.demo.io

<!-- Ignore other parameters -->
<reg:zookeeper id="regCenter" server-lists="zkp.demo.io" />
  1. start application
  2. disable network
  3. delete hosts -> 192.168.1.100 zkp.demo.io
  4. resume application
TeslaCN commented 3 years ago

Hi @sdlzhd DNS down means cannot get the IP of registry center. How to resume?

sdlzhd commented 3 years ago

I mean if the network is recovered, the DNS has not fully recovered(May take 15 second), which will cause the ElasticJob to offline, and Thread is dead, the ElasticJob will never go online.