Closed gabrii closed 11 months ago
To add more details, the task keep on restarting with the closing error of "Task failed ELB health checks":
Eventually, the token for the CLI expires and it crashes. But the stack deployment continues for a long time, until it gets automatically rolled back:
There are two issues:
Thank you for finding this out! We need to fix both of them.
For now I would suggest to use CI to deploy the environment
Describe the bug
The ApiStack deploy gets stuck on creating the service for ECS, as the service is not healthy so the cluster never finishes creation.
It's my 4th day trying to get the SaaS boilerplate to run on AWS, and managed to work around all previous problems and my mistakes and got stuck on this problem. Today I did a completely clean setup (except some minor fixes that I had to do on previous runs, like an
-entrypoint
typo that should have been--entrypoint
, and some other typos that were causing errors on previous runs on Windows).Steps to reproduce
System Info
Logs
pnmp saas deploy where it gets stuck (old to new)
```shell backend: coherent-qa-ApiStack | 18/23 | 4:27:18 PM | CREATE_COMPLETE | AWS::Route53::RecordSet | ApiService/DNSMainPublicLoadBalancer1 (ApiServiceDNSMainPublicLoadBalancer14FFC75F9) backend: coherent-qa-ApiStack | 19/23 | 4:27:18 PM | CREATE_COMPLETE | AWS::Route53::RecordSet | ApiService/DNSMainPublicLoadBalancer0 (ApiServiceDNSMainPublicLoadBalancer0A4DA258C) w backend: 19/23 Currently in progress: coherent-qa-ApiStack, ApiService199661B5` ```Here are the logs from the ECS (from new to old)
```shell November 02, 2023 at 16:31 (UTC+1:00) Service Unavailable: /lbcheck a0bb0d81c5e64bc9bf291ef10fb9aaee backend November 02, 2023 at 16:31 (UTC+1:00) Service Unavailable: /lbcheck a0bb0d81c5e64bc9bf291ef10fb9aaee backend November 02, 2023 at 16:31 (UTC+1:00) Service Unavailable: /lbcheck a0bb0d81c5e64bc9bf291ef10fb9aaee backend November 02, 2023 at 16:31 (UTC+1:00) Service Unavailable: /lbcheck a0bb0d81c5e64bc9bf291ef10fb9aaee backend November 02, 2023 at 16:31 (UTC+1:00) Encountered an issue while polling targets. ab577365f6b24b6aa71d07d72d551ae0 backend November 02, 2023 at 16:31 (UTC+1:00) Traceback (most recent call last): ab577365f6b24b6aa71d07d72d551ae0 backend November 02, 2023 at 16:31 (UTC+1:00) File "/pkgs/__pypackages__/3.11/lib/urllib3/connection.py", line 174, in _new_conn ab577365f6b24b6aa71d07d72d551ae0 backend November 02, 2023 at 16:31 (UTC+1:00) conn = connection.create_connection( ab577365f6b24b6aa71d07d72d551ae0 backend November 02, 2023 at 16:31 (UTC+1:00) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .... full stack trace ... November 02, 2023 at 16:31 (UTC+1:00) raise EndpointConnectionError(endpoint_url=request.url, error=e) ab577365f6b24b6aa71d07d72d551ae0 backend November 02, 2023 at 16:31 (UTC+1:00) botocore.exceptions.EndpointConnectionError: Could not connect to the endpoint URL: "http://127.0.0.1:2000/SamplingTargets" ab577365f6b24b6aa71d07d72d551ae0 backend November 02, 2023 at 16:31 (UTC+1:00) Service Unavailable: /lbcheck a0bb0d81c5e64bc9bf291ef10fb9aaee backend November 02, 2023 at 16:31 (UTC+1:00) Service Unavailable: /lbcheck ... many of these .... November 02, 2023 at 16:28 (UTC+1:00) Service Unavailable: /lbcheck ab577365f6b24b6aa71d07d72d551ae0 backend November 02, 2023 at 16:28 (UTC+1:00) Service Unavailable: /lbcheck ab577365f6b24b6aa71d07d72d551ae0 backend November 02, 2023 at 16:28 (UTC+1:00) No effective centralized sampling rule match. Fallback to local rules. ab577365f6b24b6aa71d07d72d551ae0 backend November 02, 2023 at 16:28 (UTC+1:00) No effective centralized sampling rule match. Fallback to local rules. ab577365f6b24b6aa71d07d72d551ae0 backend November 02, 2023 at 16:27 (UTC+1:00) [2023-11-02 15:27:44 +0000] [40] [INFO] Booting worker with pid: 40 ab577365f6b24b6aa71d07d72d551ae0 backend November 02, 2023 at 16:27 (UTC+1:00) [2023-11-02 15:27:44 +0000] [39] [INFO] Booting worker with pid: 39 ab577365f6b24b6aa71d07d72d551ae0 backend November 02, 2023 at 16:27 (UTC+1:00) [2023-11-02 15:27:44 +0000] [36] [INFO] Starting gunicorn 21.2.0 ab577365f6b24b6aa71d07d72d551ae0 backend November 02, 2023 at 16:27 (UTC+1:00) [2023-11-02 15:27:44 +0000] [36] [INFO] Listening at: http://0.0.0.0:80 (36) ab577365f6b24b6aa71d07d72d551ae0 backend November 02, 2023 at 16:27 (UTC+1:00) [2023-11-02 15:27:44 +0000] [36] [INFO] Using worker: gevent ab577365f6b24b6aa71d07d72d551ae0 backend November 02, 2023 at 16:27 (UTC+1:00) Starting app server... ```What's interesting is that the worker is running and responding to request (from ECS servcie task logs)...
```shell November 02, 2023 at 16:39 (UTC+1:00) Service Unavailable: /lbcheck a9ccfe57d4f4416eb096ab0625a009da backend November 02, 2023 at 16:39 (UTC+1:00) 10.0.1.175 - - [02/Nov/2023:15:39:39 +0000] "POST /api/graphql/ HTTP/1.1" 200 298 "-" "Amazon CloudFront" a9ccfe57d4f4416eb096ab0625a009da backend November 02, 2023 at 16:39 (UTC+1:00) [2023-11-02 15:39:38 +0000] [40] [INFO] Booting worker with pid: 40 a3d652eae7464c9eb4954b7a35dba52f backend November 02, 2023 at 16:39 (UTC+1:00) [2023-11-02 15:39:38 +0000] [39] [INFO] Booting worker with pid: 39 a3d652eae7464c9eb4954b7a35dba52f backend November 02, 2023 at 16:39 (UTC+1:00) [2023-11-02 15:39:38 +0000] [38] [INFO] Starting gunicorn 21.2.0 a3d652eae7464c9eb4954b7a35dba52f backend November 02, 2023 at 16:39 (UTC+1:00) [2023-11-02 15:39:38 +0000] [38] [INFO] Listening at: http://0.0.0.0:80 (38) a3d652eae7464c9eb4954b7a35dba52f backend November 02, 2023 at 16:39 (UTC+1:00) [2023-11-02 15:39:38 +0000] [38] [INFO] Using worker: gevent ```Even though it returns 200, there is an error on login as it seems the database migrations are not run (from the webapp login error message):
```shell relation "users_user" does not exist LINE 1: ...r"."otp_base32", "users_user"."otp_auth_url" FROM "users_use... ^ ``` (which I'm guessing will be done in further steps of `npm saas deploy`, but please let me know if that's not the case).The only other issue I had in this clean run setting up a new SaaS, is this error from the workers which suspiciously are being deployed as local (???): (old to new)
```bash workers: > workers@2.3.0 sls /app/packages/workers workers: > sls "--version" workers: Framework Core: 3.35.2 (local) workers: Plugin: 7.0.3 workers: SDK: 4.4.0 workers: > workers@2.3.0 sls /app/packages/workers workers: > sls "deploy" "--stage" "local" workers: Warning: Invalid configuration encountered workers: at 'functions.ExportUsers.vpc': must have required property 'securityGroupIds' workers: at 'functions.ExportUsers.vpc': must have required property 'subnetIds' workers: at 'functions.SynchronizeContentfulContent.vpc': must have required property 'securityGroupIds' workers: at 'functions.SynchronizeContentfulContent.vpc': must have required property 'subnetIds' workers: at 'functions.WebSocketsConnectHandler.environment': must be object workers: at 'functions.WebSocketsConnectHandler.vpc': must have required property 'securityGroupIds' workers: at 'functions.WebSocketsConnectHandler.vpc': must have required property 'subnetIds' workers: at 'functions.WebSocketsMessageHandler.environment': must be object workers: at 'functions.WebSocketsMessageHandler.vpc': must have required property 'securityGroupIds' workers: at 'functions.WebSocketsMessageHandler.vpc': must have required property 'subnetIds' workers: at 'functions.WebSocketsDisconnectHandler.environment': must be object workers: at 'functions.WebSocketsDisconnectHandler.vpc': must have required property 'securityGroupIds' workers: at 'functions.WebSocketsDisconnectHandler.vpc': must have required property 'subnetIds' workers: Learn more about configuration validation here: http://slss.io/configuration-validation workers: Deploying coherent-workers to stage local (us-east-1) workers: Using serverless-localstack workers: serverless-localstack: Reconfigured endpoints workers: Error: workers: Inaccessible host: `localstack' at port `undefined'. This service may not be available in the `us-east-1' region. workers: × Stack coherent-workers failed to deploy (211s) workers: Environment: linux, node 18.18.2, framework 3.35.2 (local), plugin 7.0.3, SDK 4.4.0 workers: Credentials: Local, environment variables workers: Docs: docs.serverless.com workers: Support: forum.serverless.com workers: Bugs: github.com/serverless/serverless/issues workers: 3 deprecations found: run 'serverless doctor' for more details workers: ELIFECYCLE Command failed with exit code 1. workers: Warning: run-commands command "docker-compose run --rm --entrypoint /bin/bash workers /app/packages/workers/scripts/runtime/run_deploy.sh" exited with non-zero status code ```Validations