Is your feature request related to a problem? Please describe.
As stated in the issue title, the amplifier API integration component is causing dependent relayers to crash due to a failed health check on the amplifier API endpoint.
2024-11-20T10:31:48.103497Z ERROR start_and_wait_for_shutdown: relayer_engine: /Users/if_you_know_you_know/.cargo/git/checkouts/axelar-relayer-core-ee05e092797b9627/bb7071c/crates/relayer-engine/src/lib.rs:53: A task returned an error, shutting down the system err=
0: Reqwest error error sending request for url (https://amplifier-devnet-amplifier.devnet.axelar.dev/health)
1: error sending request for url (https://amplifier-devnet-amplifier.devnet.axelar.dev/health)
2: client error (SendRequest)
3: http2 error
4: keep-alive timed out
5: operation timed out
Location:
/Users/if_you_know_you_know/.cargo/git/checkouts/axelar-relayer-core-ee05e092797b9627/bb7071c/crates/relayer-amplifier-api-integration/src/healthcheck.rs:13
Describe the solution you'd like
To begin with, we should implement support for failover scenarios by allowing multiple amplifier API endpoints. In case of a primary endpoint failure, the system should automatically try with other endpoints.
The health check mechanism should not be able to terminate the entire process. Instead, it should monitor, log the failure, and retry across available endpoints in the event of a failure.
Additional context
The affected component impacts all dependent relayers. A decision to re-evaluate this at later stages was made by quorum consensus.
Is your feature request related to a problem? Please describe.
As stated in the issue title, the amplifier API integration component is causing dependent relayers to crash due to a failed health check on the amplifier API endpoint.
Describe the solution you'd like
To begin with, we should implement support for failover scenarios by allowing multiple amplifier API endpoints. In case of a primary endpoint failure, the system should automatically try with other endpoints.
The health check mechanism should not be able to terminate the entire process. Instead, it should monitor, log the failure, and retry across available endpoints in the event of a failure.
Additional context
The affected component impacts all dependent relayers. A decision to re-evaluate this at later stages was made by quorum consensus.