create-cluster will only upload Arkime configuration if it hasn't already been uploaded. After that, config-update must be used to make changes. This makes it vastly more likely we can always roll back a failed change to the underlying Docker Image or the dynamic configuration we pull into it.
config-update first checks if the local copy of the Arkime config is different than what's running in the Cloud. If it is, it's sent to the cloud and the ECS containers are recycled via an ECS force-deploy. If the containers fail to stabilize, we automatically detect that by looking for failed tasks and revert back to the previous configuration.
Ran create-cluster to demonstrate that it doesn't update the config if it's already uploaded
(.venv) chelma@3c22fba4e266 aws-aio % ./manage_arkime.py create-cluster --name MyCluster --preconfirm-usage
2023-07-26 12:55:22 - Debug-level logs save to file: /Users/chelma/workspace/Arkime/aws-aio/manage_arkime/manage_arkime.log
2023-07-26 12:55:22 - Using AWS Credential Profile: default
2023-07-26 12:55:22 - Using AWS Region: default from AWS Config settings
2023-07-26 12:55:24 - Usage report:
Arkime Metadata:
Session Retention [days]: 30
User History Retention [days]: 365
Capture Nodes:
Max Count: 2
Desired Count: 1
Min Count: 1
Type: m5.xlarge
OpenSearch Domain:
Master Node Count: 3
Master Node Type: m5.large.search
Data Node Count: 2
Data Node Type: t3.small.search
Data Node Volume Size [GB]: 100
S3:
PCAP Retention [days]: 30
2023-07-26 12:55:24 - Ensuring Arkime Config dir exists for cluster: MyCluster
2023-07-26 12:55:24 - Arkime Config dir exists at: /Users/chelma/workspace/Arkime/aws-aio/config-MyCluster
2023-07-26 12:55:24 - Copying default Arkime Config to dir: /Users/chelma/workspace/Arkime/aws-aio/config-MyCluster
2023-07-26 12:55:24 - Cluster config directory not empty; skipping copy
2023-07-26 12:55:24 - Determining the status of S3 bucket: arkimeconfig-XXXXXXXXXXXX-us-east-2-mycluster
2023-07-26 12:55:25 - S3 Bucket arkimeconfig-XXXXXXXXXXXX-us-east-2-mycluster already exists; no work needed
2023-07-26 12:55:25 - Uploading Arkime config for Capture Nodes...
2023-07-26 12:55:26 - Config has been uploaded previously; skipping
2023-07-26 12:55:26 - Uploading Arkime config for Viewer Nodes...
2023-07-26 12:55:26 - Config has been uploaded previously; skipping
2023-07-26 12:55:26 - Executing command: deploy MyCluster-CaptureBucket MyCluster-CaptureNodes MyCluster-CaptureVPC MyCluster-OSDomain MyCluster-ViewerNodes
2023-07-26 12:55:26 - NOTE: This operation can take a while. You can 'tail -f' the logfile to track the status.
2023-07-26 12:58:09 - Deployment succeeded
* Changed the Capture Nodes' configuration (but not the Viewer Nodes' configuration) and ran `config-update`. The CLI uploaded and bounced the Capture Nodes successfully.
(.venv) chelma@3c22fba4e266 aws-aio % ./manage_arkime.py config-update --cluster-name MyCluster
2023-07-26 12:45:34 - Debug-level logs save to file: /Users/chelma/workspace/Arkime/aws-aio/manage_arkime/manage_arkime.log
2023-07-26 12:45:34 - Using AWS Credential Profile: default
2023-07-26 12:45:34 - Using AWS Region: default from AWS Config settings
2023-07-26 12:45:35 - Updating Arkime config for Capture Nodes, if necessary...
2023-07-26 12:45:35 - Turning Capture configuration at /Users/chelma/workspace/Arkime/aws-aio/config-MyCluster/capture into archive at /Users/chelma/workspace/Arkime/aws-aio/config-MyCluster/capture.zip
2023-07-26 12:45:35 - Pulling existing configuration details from Param Store at: /arkime/clusters/MyCluster/capture-config-details
2023-07-26 12:45:35 - Uploading config archive to S3 bucket: arkimeconfig-XXXXXXXXXXXX-us-east-2-mycluster
2023-07-26 12:45:36 - Updating config details in Param Store at: /arkime/clusters/MyCluster/capture-config-details
2023-07-26 12:45:37 - Bouncing ECS Service MyCluster-CaptureNodes-ServiceD69D759B-xCoMmgDzf3eZ to pick up the new Arkime config...
2023-07-26 12:45:39 - Waiting 15 more seconds for ECS service to finish bouncing...
2023-07-26 12:45:54 - Waiting 15 more seconds for ECS service to finish bouncing...
2023-07-26 12:46:10 - Waiting 15 more seconds for ECS service to finish bouncing...
2023-07-26 12:46:26 - Waiting 15 more seconds for ECS service to finish bouncing...
2023-07-26 12:46:42 - Waiting 15 more seconds for ECS service to finish bouncing...
2023-07-26 12:46:58 - Waiting 15 more seconds for ECS service to finish bouncing...
2023-07-26 12:47:14 - Waiting 15 more seconds for ECS service to finish bouncing...
2023-07-26 12:47:30 - Waiting 15 more seconds for ECS service to finish bouncing...
2023-07-26 12:47:45 - ECS Service MyCluster-CaptureNodes-ServiceD69D759B-xCoMmgDzf3eZ bounced successfully
2023-07-26 12:47:45 - Updating Arkime config for Viewer Nodes, if necessary...
2023-07-26 12:47:45 - Turning Viewer configuration at /Users/chelma/workspace/Arkime/aws-aio/config-MyCluster/viewer into archive at /Users/chelma/workspace/Arkime/aws-aio/config-MyCluster/viewer.zip
2023-07-26 12:47:45 - Pulling existing configuration details from Param Store at: /arkime/clusters/MyCluster/viewer-config-details
2023-07-26 12:47:46 - Local config is the same as what's currently deployed; skipping
* Updated the Capture Nodes' configuration so that the container spinup process would fail (inserted an `exit 1`), then ran `config-update`. The CLI uploaded the new config, saw it was broken, and successfully rolled back to the previous configuration.
(.venv) chelma@3c22fba4e266 aws-aio % ./manage_arkime.py config-update --cluster-name MyCluster
2023-07-26 12:49:58 - Debug-level logs save to file: /Users/chelma/workspace/Arkime/aws-aio/manage_arkime/manage_arkime.log
2023-07-26 12:49:58 - Using AWS Credential Profile: default
2023-07-26 12:49:58 - Using AWS Region: default from AWS Config settings
2023-07-26 12:49:58 - Updating Arkime config for Capture Nodes, if necessary...
2023-07-26 12:49:58 - Turning Capture configuration at /Users/chelma/workspace/Arkime/aws-aio/config-MyCluster/capture into archive at /Users/chelma/workspace/Arkime/aws-aio/config-MyCluster/capture.zip
2023-07-26 12:49:58 - Pulling existing configuration details from Param Store at: /arkime/clusters/MyCluster/capture-config-details
2023-07-26 12:49:59 - Uploading config archive to S3 bucket: arkimeconfig-XXXXXXXXXXXX-us-east-2-mycluster
2023-07-26 12:50:00 - Updating config details in Param Store at: /arkime/clusters/MyCluster/capture-config-details
2023-07-26 12:50:01 - Bouncing ECS Service MyCluster-CaptureNodes-ServiceD69D759B-xCoMmgDzf3eZ to pick up the new Arkime config...
2023-07-26 12:50:02 - Waiting 15 more seconds for ECS service to finish bouncing...
2023-07-26 12:50:18 - Waiting 15 more seconds for ECS service to finish bouncing...
2023-07-26 12:50:34 - Waiting 15 more seconds for ECS service to finish bouncing...
2023-07-26 12:50:50 - Waiting 15 more seconds for ECS service to finish bouncing...
2023-07-26 12:51:06 - Waiting 15 more seconds for ECS service to finish bouncing...
2023-07-26 12:51:22 - Waiting 15 more seconds for ECS service to finish bouncing...
2023-07-26 12:51:38 - Failed task limit (3) exceeded; rolling back to previous config
2023-07-26 12:51:38 - Pulling in-progress configuration details from Param Store at: /arkime/clusters/MyCluster/capture-config-details
2023-07-26 12:51:38 - Uploading reverted config details to Param Store at: /arkime/clusters/MyCluster/capture-config-details
2023-07-26 12:51:39 - Waiting 15 more seconds for ECS service to finish bouncing...
2023-07-26 12:51:55 - Waiting 15 more seconds for ECS service to finish bouncing...
2023-07-26 12:52:11 - Waiting 15 more seconds for ECS service to finish bouncing...
2023-07-26 12:52:27 - Waiting 15 more seconds for ECS service to finish bouncing...
2023-07-26 12:52:43 - Waiting 15 more seconds for ECS service to finish bouncing...
2023-07-26 12:52:58 - ECS Service MyCluster-CaptureNodes-ServiceD69D759B-xCoMmgDzf3eZ bounced successfully
2023-07-26 12:52:58 - Updating Arkime config for Viewer Nodes, if necessary...
2023-07-26 12:52:58 - Turning Viewer configuration at /Users/chelma/workspace/Arkime/aws-aio/config-MyCluster/viewer into archive at /Users/chelma/workspace/Arkime/aws-aio/config-MyCluster/viewer.zip
2023-07-26 12:52:58 - Pulling existing configuration details from Param Store at: /arkime/clusters/MyCluster/viewer-config-details
2023-07-26 12:52:59 - Local config is the same as what's currently deployed; skipping
## License
I confirm that this contribution is made under an Apache 2.0 license and that I have the authority necessary to make this contribution on behalf of its copyright owner.
Description
config-update
CLI commandcreate-cluster
will only upload Arkime configuration if it hasn't already been uploaded. After that,config-update
must be used to make changes. This makes it vastly more likely we can always roll back a failed change to the underlying Docker Image or the dynamic configuration we pull into it.config-update
first checks if the local copy of the Arkime config is different than what's running in the Cloud. If it is, it's sent to the cloud and the ECS containers are recycled via an ECS force-deploy. If the containers fail to stabilize, we automatically detect that by looking for failed tasks and revert back to the previous configuration.Tasks
Testing
create-cluster
to demonstrate that it doesn't update the config if it's already uploaded2023-07-26 12:55:24 - Ensuring Arkime Config dir exists for cluster: MyCluster 2023-07-26 12:55:24 - Arkime Config dir exists at: /Users/chelma/workspace/Arkime/aws-aio/config-MyCluster 2023-07-26 12:55:24 - Copying default Arkime Config to dir: /Users/chelma/workspace/Arkime/aws-aio/config-MyCluster 2023-07-26 12:55:24 - Cluster config directory not empty; skipping copy 2023-07-26 12:55:24 - Determining the status of S3 bucket: arkimeconfig-XXXXXXXXXXXX-us-east-2-mycluster 2023-07-26 12:55:25 - S3 Bucket arkimeconfig-XXXXXXXXXXXX-us-east-2-mycluster already exists; no work needed 2023-07-26 12:55:25 - Uploading Arkime config for Capture Nodes... 2023-07-26 12:55:26 - Config has been uploaded previously; skipping 2023-07-26 12:55:26 - Uploading Arkime config for Viewer Nodes... 2023-07-26 12:55:26 - Config has been uploaded previously; skipping 2023-07-26 12:55:26 - Executing command: deploy MyCluster-CaptureBucket MyCluster-CaptureNodes MyCluster-CaptureVPC MyCluster-OSDomain MyCluster-ViewerNodes 2023-07-26 12:55:26 - NOTE: This operation can take a while. You can 'tail -f' the logfile to track the status. 2023-07-26 12:58:09 - Deployment succeeded
(.venv) chelma@3c22fba4e266 aws-aio % ./manage_arkime.py config-update --cluster-name MyCluster 2023-07-26 12:45:34 - Debug-level logs save to file: /Users/chelma/workspace/Arkime/aws-aio/manage_arkime/manage_arkime.log 2023-07-26 12:45:34 - Using AWS Credential Profile: default 2023-07-26 12:45:34 - Using AWS Region: default from AWS Config settings 2023-07-26 12:45:35 - Updating Arkime config for Capture Nodes, if necessary... 2023-07-26 12:45:35 - Turning Capture configuration at /Users/chelma/workspace/Arkime/aws-aio/config-MyCluster/capture into archive at /Users/chelma/workspace/Arkime/aws-aio/config-MyCluster/capture.zip 2023-07-26 12:45:35 - Pulling existing configuration details from Param Store at: /arkime/clusters/MyCluster/capture-config-details 2023-07-26 12:45:35 - Uploading config archive to S3 bucket: arkimeconfig-XXXXXXXXXXXX-us-east-2-mycluster 2023-07-26 12:45:36 - Updating config details in Param Store at: /arkime/clusters/MyCluster/capture-config-details 2023-07-26 12:45:37 - Bouncing ECS Service MyCluster-CaptureNodes-ServiceD69D759B-xCoMmgDzf3eZ to pick up the new Arkime config... 2023-07-26 12:45:39 - Waiting 15 more seconds for ECS service to finish bouncing... 2023-07-26 12:45:54 - Waiting 15 more seconds for ECS service to finish bouncing... 2023-07-26 12:46:10 - Waiting 15 more seconds for ECS service to finish bouncing... 2023-07-26 12:46:26 - Waiting 15 more seconds for ECS service to finish bouncing... 2023-07-26 12:46:42 - Waiting 15 more seconds for ECS service to finish bouncing... 2023-07-26 12:46:58 - Waiting 15 more seconds for ECS service to finish bouncing... 2023-07-26 12:47:14 - Waiting 15 more seconds for ECS service to finish bouncing... 2023-07-26 12:47:30 - Waiting 15 more seconds for ECS service to finish bouncing... 2023-07-26 12:47:45 - ECS Service MyCluster-CaptureNodes-ServiceD69D759B-xCoMmgDzf3eZ bounced successfully 2023-07-26 12:47:45 - Updating Arkime config for Viewer Nodes, if necessary... 2023-07-26 12:47:45 - Turning Viewer configuration at /Users/chelma/workspace/Arkime/aws-aio/config-MyCluster/viewer into archive at /Users/chelma/workspace/Arkime/aws-aio/config-MyCluster/viewer.zip 2023-07-26 12:47:45 - Pulling existing configuration details from Param Store at: /arkime/clusters/MyCluster/viewer-config-details 2023-07-26 12:47:46 - Local config is the same as what's currently deployed; skipping
(.venv) chelma@3c22fba4e266 aws-aio % ./manage_arkime.py config-update --cluster-name MyCluster 2023-07-26 12:49:58 - Debug-level logs save to file: /Users/chelma/workspace/Arkime/aws-aio/manage_arkime/manage_arkime.log 2023-07-26 12:49:58 - Using AWS Credential Profile: default 2023-07-26 12:49:58 - Using AWS Region: default from AWS Config settings 2023-07-26 12:49:58 - Updating Arkime config for Capture Nodes, if necessary... 2023-07-26 12:49:58 - Turning Capture configuration at /Users/chelma/workspace/Arkime/aws-aio/config-MyCluster/capture into archive at /Users/chelma/workspace/Arkime/aws-aio/config-MyCluster/capture.zip 2023-07-26 12:49:58 - Pulling existing configuration details from Param Store at: /arkime/clusters/MyCluster/capture-config-details 2023-07-26 12:49:59 - Uploading config archive to S3 bucket: arkimeconfig-XXXXXXXXXXXX-us-east-2-mycluster 2023-07-26 12:50:00 - Updating config details in Param Store at: /arkime/clusters/MyCluster/capture-config-details 2023-07-26 12:50:01 - Bouncing ECS Service MyCluster-CaptureNodes-ServiceD69D759B-xCoMmgDzf3eZ to pick up the new Arkime config... 2023-07-26 12:50:02 - Waiting 15 more seconds for ECS service to finish bouncing... 2023-07-26 12:50:18 - Waiting 15 more seconds for ECS service to finish bouncing... 2023-07-26 12:50:34 - Waiting 15 more seconds for ECS service to finish bouncing... 2023-07-26 12:50:50 - Waiting 15 more seconds for ECS service to finish bouncing... 2023-07-26 12:51:06 - Waiting 15 more seconds for ECS service to finish bouncing... 2023-07-26 12:51:22 - Waiting 15 more seconds for ECS service to finish bouncing... 2023-07-26 12:51:38 - Failed task limit (3) exceeded; rolling back to previous config 2023-07-26 12:51:38 - Pulling in-progress configuration details from Param Store at: /arkime/clusters/MyCluster/capture-config-details 2023-07-26 12:51:38 - Uploading reverted config details to Param Store at: /arkime/clusters/MyCluster/capture-config-details 2023-07-26 12:51:39 - Waiting 15 more seconds for ECS service to finish bouncing... 2023-07-26 12:51:55 - Waiting 15 more seconds for ECS service to finish bouncing... 2023-07-26 12:52:11 - Waiting 15 more seconds for ECS service to finish bouncing... 2023-07-26 12:52:27 - Waiting 15 more seconds for ECS service to finish bouncing... 2023-07-26 12:52:43 - Waiting 15 more seconds for ECS service to finish bouncing... 2023-07-26 12:52:58 - ECS Service MyCluster-CaptureNodes-ServiceD69D759B-xCoMmgDzf3eZ bounced successfully 2023-07-26 12:52:58 - Updating Arkime config for Viewer Nodes, if necessary... 2023-07-26 12:52:58 - Turning Viewer configuration at /Users/chelma/workspace/Arkime/aws-aio/config-MyCluster/viewer into archive at /Users/chelma/workspace/Arkime/aws-aio/config-MyCluster/viewer.zip 2023-07-26 12:52:58 - Pulling existing configuration details from Param Store at: /arkime/clusters/MyCluster/viewer-config-details 2023-07-26 12:52:59 - Local config is the same as what's currently deployed; skipping