coherenceplatform / cnc

CNC is the first framework for precision platform engineering
https://cncframework.com
GNU General Public License v3.0
92 stars 6 forks source link

`cnc toolbox` doesn't work on Mac OS #159

Closed aheitzmann closed 1 month ago

aheitzmann commented 1 month ago

Description

cnc toolbox start command loops indefinitely, printing out "Warning: issue starting port forwarding session for db (failed_attempts: )" errors.

Running the underlying aws ssm command directly successfully configures a port forward.

Cause

Referring to the code in toolbox/main.sh.j2

start_ssm_sessions () {
    PID_FALLBACK="none"
    while test -f $TOOLBOX_ACTIVE_TEMP_FILEPATH
    do
    {% for resource in environment.database_resources + environment.cache_resources %}
        if ! test -d /proc/${SSM_PID_{{ loop.index }}:-$PID_FALLBACK}/ > /dev/null
        then
            SSM_RETRY_COUNT_{{ loop.index }}=${SSM_RETRY_COUNT_{{ loop.index }}:-0}
            if [ $SSM_RETRY_COUNT_{{ loop.index }} -eq 0 ]
            then
                echo -e "\nStarting ssm port-forwarding session for {{ resource.name }}..."
            fi

            aws ssm start-session --target {{ bastion_instance_id }} \
            --document-name AWS-StartPortForwardingSessionToRemoteHost \
            --parameters '{{ resource.settings.toolbox_ssh_port_mapping }}' > /tmp/cnc_ssh_output_{{ loop.index }} 2>&1 &
            SSM_PID_{{ loop.index }}=$!

            if [ $SSM_RETRY_COUNT_{{ loop.index }} -gt 3 ]
            then
                cmd_output=$(</tmp/cnc_ssh_output_{{ loop.index }})
                echo -e "\nWarning: issue starting port forwarding session for {{ resource.name }}" \
                "(failed_attempts: $SSM_RETRY_COUNT_{{ loop.index }})\n$cmd_output"
            fi

            SSM_RETRY_COUNT_{{ loop.index }}=`expr $SSM_RETRY_COUNT_{{ loop.index }} + 1`
        fi
    {% endfor %}
    sleep 10
    done &
}

Note that /proc does not exist on MacOS, therefore the test if ! test -d /proc/${SSM_PID_{{ loop.index }}:-$PID_FALLBACK}/ > /dev/null will never return true.

Repro Details

Machine: Apple M1 Pro OS: Sonoma 14.6.1 (23G93)

cnc.yml

services:
  app:
    ports:
      - "8080:8080"
    build:
      context: .
      dockerfile: Dockerfile
    deploy:
      resources:
        limits:
          cpus: 0.5
          memory: 2g
    x-cnc:
      type: backend
      system:
        health_check: /health
        platform_settings:
          min_scale: 1
          max_scale: 1
  db:
    x-cnc:
      type: database
      version: 16
    image: postgres

environments.yml

name: backend
provider: aws
flavor: ecs
version: 1

collections:
  - name: main
    region: us-east-2
    base_domain: app.groundedft.com
    account_id: "009160027816"
    environments:
      - name: demo
        environment_variables:
          - name: LOG_LEVEL
            value: INFO
          - name: FRAGMENT_API_URL
            value: https://api.us-east-1.fragment.dev/graphql
          - name: FRAGMENT_AUTH_URL
            value: https://auth.us-east-1.fragment.dev/oauth2/token
          - name: FRAGMENT_AUTH_SCOPE
            value: https://api.us-east-1.fragment.dev/*
          - name: FRAGMENT_LEDGER_ID
            value: 4cd9af29-23c2-4bff-94d7-db3ea0d63462
          - name: FRAGMENT_API_KEY
            value: 3cqj3rmep2btb6g51bett11t63
          - name: DB_MIGRATIONS_STARTUP_CHECK
             # TODO: change to "verify"
            value: skip
    # Aliases
    # Note: DATABASE_URL is Automatically provided by cnc but uses "postgres" 
    # instead of "postgresql" as the scheme. Rather than correct this in code for 
    # sqlalchemy compat, we'll alias the other DB-related env vars to match our expectations
          - name: DATABASE_HOST
            alias: DB_HOST
          - name: DATABASE_PORT
            alias: DB_PORT
          - name: DATABASE_NAME
            alias: DB_NAME
          - name: DATABASE_USERNAME
            alias: DB_USER
          - name: DATABASE_PASSWORD
            alias: DB_PASSWORD
    # Manually added secrets:
          - name: FRAGMENT_API_SECRET
            secret_id: "main/backend/demo/3p_access:main-backend-demo-fragment_api_secret::"
          - name: ADMIN_API_SECRET
            secret_id: "main/backend/demo/3p_access:main-backend-demo-admin_api_secret::"

Command: cnc toolbox start demo --service-name db --proxy-only

Output:

Common(application=<Application (name: backend | provider: aws (ecs/1))>, collection=None)
DEBUG (cnc.models.application:125) no default provided - returning first collection as default for <Application (name: backend | provider: aws (ecs/1))>
Sending {'name': 'toolbox.start'} to RS for 79CC224E-1B37-536A-8659-88B23899344A
DEBUG (cnc.models.toolbox:73) Rendering toolbox script for <ToolboxManager:<Environment (name: demo | collection: main | ['app', 'db'])> @ {'db': 'latest'}> @
/tmp/.cnc_tmp_backend/app_backend/aws_ecs/1/d572e3f7765cc827262d96eafd8e756fff32090a02573d02df757b977be73e55/toolbox
DEBUG (cnc.models.environment_collection:332) Going to get outputs for <EnvironmentCollection (main | 009160027816) [1 envs]>: {}
DEBUG (cnc.models.provisioner:58) Cleaning up & setting up at start for <ProvisionStageManager: <EnvironmentCollection (main | 009160027816) [1 envs]> | output_only: True>
DEBUG (cnc.models.provisioner:67) Writing main.tf.j2 for <ProvisionStageManager: <EnvironmentCollection (main | 009160027816) [1 envs]> | output_only: True>
DEBUG (cnc.models.provisioner:147) Installing TF modules/providers for <ProvisionStageManager: <EnvironmentCollection (main | 009160027816) [1 envs]> | output_only: True>
INFO (cnc.models.provisioner:398) TF RUN (<ProvisionStageManager: <EnvironmentCollection (main | 009160027816) [1 envs]> | output_only: True>): [['terraform', 'init']] 0 in 7 seconds
INFO (cnc.models.provisioner:398) TF RUN (<ProvisionStageManager: <EnvironmentCollection (main | 009160027816) [1 envs]> | output_only: True>): [['terraform', 'output', '-json']] 0 in 0 seconds
DEBUG (cnc.models.toolbox:76) Done rendering toolbox script for <ToolboxManager:<Environment (name: demo | collection: main | ['app', 'db'])> @ {'db': 'latest'}>, starting toolbox...

You are currently authenticated as: {
    "UserId": "AROAQEIP252UNLSNR76LG:alex",
    "Account": "009160027816",
    "Arn": "arn:aws:sts::009160027816:assumed-role/AWSReservedSSO_GroundedBackendProvisioning_2e37e3bf90986ef1/alex"
}

Starting ssm port-forwarding session for db...

Toolbox started in proxy only mode.

The following environment variables are normally injected into your service container. They can be used to establish connections to your AWS resources from your local machine.

DATABASE_PASSWORD=<redacted>
FRAGMENT_API_SECRET=<redacted>
ADMIN_API_SECRET=<redacted>
CNC_APPLICATION_NAME=backend
CNC_ENVIRONMENT_NAME=demo
CNC_ENVIRONMENT_DOMAIN=demo.app.groundedft.com
CNC_ENVIRONMENT_REGION=us-east-2
CNC_INSTANCE_NAME=ea8a1c02af-backe-main-demo-app
PORT=8080
LOG_LEVEL=INFO
FRAGMENT_API_URL=https://api.us-east-1.fragment.dev/graphql
FRAGMENT_AUTH_URL=https://auth.us-east-1.fragment.dev/oauth2/token
FRAGMENT_AUTH_SCOPE=https://api.us-east-1.fragment.dev/*
FRAGMENT_LEDGER_ID=4cd9af29-23c2-4bff-94d7-db3ea0d63462
FRAGMENT_API_KEY=3cqj3rmep2btb6g51bett11t63
DB_MIGRATIONS_STARTUP_CHECK=skip
DATABASE_HOST=cc8091660f-backe-main-demo-db.proxy-criygig4oah6.us-east-2.rds.amazonaws.com
DATABASE_PORT=5432
DATABASE_NAME=demo
DATABASE_USERNAME=backend
DB_ENDPOINT=localhost
DB_HOST=localhost
DB_IP=localhost
DB_PORT=5442
DB_NAME=demo
DB_USER=backend
DB_ENGINE=postgres
DB_DATABASE_URL=<redacted>
DATABASE_URL=<redacted>
DB_DB_PASSWORD=<redacted>
DB_PASSWORD=<redacted>

 Press Ctrl+C to exit. (all port-forwarding sessions will be closed)

Warning: issue starting port forwarding session for db (failed_attempts: 4)
Warning: issue starting port forwarding session for db (failed_attempts: 5)
Warning: issue starting port forwarding session for db (failed_attempts: 6)
Starting session with SessionId: alex-qr5ngulogsy3lr6tulcccxdcji
Warning: issue starting port forwarding session for db (failed_attempts: 7)
<...etc indefinitely>

A variant of the the same thing occurs when running cnc toolbox start demo --service-name app since the container needs to establish a connection to the database.

zach-withcoherence commented 1 month ago

Thanks so much for this @aheitzmann - we've got a fix in progress here and should have this working next week.

zach-withcoherence commented 1 month ago

Got this out today, should be fixed in https://github.com/coherenceplatform/cnc/releases/tag/0.2.19

aheitzmann commented 1 month ago

Got this out today, should be fixed in https://github.com/coherenceplatform/cnc/releases/tag/0.2.19

Confirmed that both cnc toolbox start demo --service-name app and cnc toolbox start demo --service-name db --proxy-only are working on my mac with the new cocnc version.