cube-js / cube

📊 Cube — Universal semantic layer platform for AI, BI, spreadsheets, and embedded analytics
https://cube.dev
Other
17.97k stars 1.78k forks source link

Serverless offline framework support #121

Closed mouhannad-sh closed 5 years ago

mouhannad-sh commented 5 years ago

Is your feature request related to a problem? Please describe. I can't run cubeJS in the same serverless offline instance. The /playground/context endpoint was returning [Object Object] as a value for apiUrl which caused wrong routing to the meta endpoint.

After I looked at the source code I figured that you need a value for the env variable CUBEJS_API_URL so I fixed the routing issue by adding an env variable CUBEJS_API_URL='' to my .env.

of course this didn't fix the main issue which is running the load endpoint offline.

Describe the solution you'd like A way to use and configure CubeJS from serverless.yml to work with serverless-offline and serverless-offline-sns

Describe alternatives you've considered N/A

Additional context I'm trying to run CubeJS locally using the configs from serverless.yml. I was able to generate a schema using the playground app however I'm stuck in an endless loop and keep getting the same message in the console;

Serverless: ANY /cubejs-api/v1/load (λ: cubejs)
Load Request: {"query":"{\"measures\":[\"Consultants.count\"],\"timeDimensions\":[{\"dimension\":\"Consultants.createdAt\",\"granularity\":\"day\"}],\"filters\":[]}","authInfo":{"iat":1559631634,"exp":1559718034}}
Query started: {"query":"SELECT date_trunc('day', (consultants.created_at::timestamptz AT TIME ZONE 'UTC')) \"consultants.created_at_date\", count(consultants.id) \"consultants.count\" FROM public.consultants AS consultants GROUP BY 1 ORDER BY 1 ASC LIMIT 10000","params":[]}
Missing cache for: {"cacheKey":["SELECT\n      date_trunc('day', (consultants.created_at::timestamptz AT TIME ZONE 'UTC')) \"consultants.created_at_date\", count(consultants.id) \"consultants.count\"\n    FROM\n      public.consultants AS consultants\n  GROUP BY 1 ORDER BY 1 ASC LIMIT 10000",[],[]]}
Missing cache for: {"cacheKey":["select max(consultants.updated_at) from public.consultants AS consultants",[]]}

I suspect this has to do with the SNS configurations from the @cubejs-backend/serverless-aws package.

I'm still a novice in serverless and AWS but after some investigation it seems like the cubeJS is not configured to run offline with serverless and generating the schema from this setup was challenging.

The SNS instance seems to be configured for production only but not development.

it would be really nice if we can run CubeJS with serverless-offline and serverless-offline-sns

my serverless.yml

service: Analytics

custom:
  api_link: "api/v1/"
  serverless-offline:
    port: 3004
  serverless-offline-sns:
    port: 4002 # a free port for the sns server to run on
    debug: true
provider:
  name: aws
  runtime: nodejs10.x
  stage: ${opt:stage, 'dev'}
  timeout: 10 #Default Lambda timeout
  versionFunctions: false
  memorySize: 2048 #Default Lambda Memory Size
  deploymentBucket: Analytics
  region: ap-southeast-2
  iamRoleStatements:
    - Effect: "Allow"
      Action:
        - "sns:*"
      Resource:
        - "*"
  environment:
    # Dev DB config
    dev_db_client: ${env:dev_db_client}
    dev_db_host: ${env:dev_db_host}
    dev_db_name: ${ssm:/Analytics/dev_db_name~true}
    dev_db_user: ${ssm:/Analytics/dev_db_user~true}
    dev_db_password: ${ssm:/Analytics/dev_db_password~true}
    dev_db_min_pool: ${env:dev_db_min_pool}
    dev_db_max_pool: ${env:dev_db_max_pool}
    # Staging DB config
    staging_db_client: ${env:staging_db_client}
    staging_db_host: ${env:staging_db_host}
    staging_db_name: ${ssm:/Analytics/staging/db_name~true}
    staging_db_user: ${ssm:/Analytics/staging/db_user~true}
    staging_db_password: ${ssm:/Analytics/staging/db_password~true}
    staging_db_min_pool: ${env:staging_db_min_pool}
    staging_db_max_pool: ${env:staging_db_max_pool}
    # CUBE CONFIGS
    CUBEJS_DB_HOST: ${env:CUBEJS_DB_HOST}
    CUBEJS_DB_NAME: ${env:CUBEJS_DB_NAME}
    CUBEJS_DB_USER: ${env:CUBEJS_DB_USER}
    CUBEJS_DB_PASS: ${env:CUBEJS_DB_PASS}
    # CUBEJS_DB_PORT: 3004
    # REDIS_URL: redis://localhost
    CUBEJS_DB_TYPE: postgres
    CUBEJS_API_SECRET: "...."
    CUBEJS_APP: "${self:service.name}-${self:provider.stage}"
    # Staging DB config
    production_db_client: ${env:production_db_client}
    production_db_host: ${env:production_db_host}
    production_db_name: ${ssm:/Analytics/production/db_name~true}
    production_db_user: ${ssm:/Analytics/production/db_user~true}
    production_db_password: ${ssm:/Analytics/production/db_password~true}
    production_db_min_pool: ${env:production_db_min_pool}
    production_db_max_pool: ${env:production_db_max_pool}
    NODE_PATH: "./:/opt/node_modules"
    NODE_ENV: ${env:NODE_ENV,'development'}
    CUBEJS_API_URL:
      Fn::Join:
        - ""
        - - "https://"
          - Ref: "ApiGatewayRestApi"
          - ".execute-api."
          - Ref: "AWS::Region"
          - ".amazonaws.com/${self:provider.stage}"
    AWS_ACCOUNT_ID:
      Fn::Join:
        - ""
        - - Ref: "AWS::AccountId"

package:
  individually: true
  excludeDevDependencies: false
  exclude:
    - node_modules/**
    - client-size/**
    - server-side/modules/**
    - .env
    - .env.example
    - .eslintignore
    - .eslintrc.json
    - ".idea/**"
    - package-lock.json
    - package.json
    - README.md

layers:
  Analytics:
    path: server-side/lib # required, path to layer contents on disk
    name: ${self:provider.stage}-Analytics-main # optional, Deployed Lambda layer name
    description: Analytics main layer # optional, Description to publish to AWS
    compatibleRuntimes: # optional, a list of runtimes this layer is compatible with
      - nodejs10.x
    package:
      include:
        - "node_modules/**"

functions:
  cubejs:
    handler: cube.api
    timeout: 30
    events:
      - http:
          path: /
          method: GET
      - http:
          path: /{proxy+}
          method: ANY
  cubejsProcess:
    handler: cube.process
    timeout: 630
    events:
      - sns: "${self:service.name}-${self:provider.stage}-process"

plugins:
  - serverless-offline
  - serverless-offline-sns
  - serverless-dotenv-plugin
  - serverless-domain-manager
  - serverless-express
paveltiunov commented 5 years ago

@mouhannad-sh Hey Nedo! Thanks for posting this one! Yep. It's due to CUBEJS_API_URL has AWS specific variable substitution. Actually it's deprecated and not required anymore. You're safe to remove it from serverless.yml. We need to remove it from template as well.

benswinburne commented 4 years ago

@mouhannad-sh did you manage to get this to work? I'm having trouble whereby I get this error from the sdk.

Invalid parameter: TopicArn Reason: Invalid namespace: [object Object] 
benswinburne commented 4 years ago

I've since found that the [object Object] comes from (i believe) serverless-offline failing to interpolate the following.

    AWS_ACCOUNT_ID:
      Fn::Join:
        - ""
        - - Ref: "AWS::AccountId"

If I set this value manually to a string, I get a bit further.

What I then found is that serverless-offline-sns should update the AWS config to set the SNS endpoint to the local one. This works if

const AWSHandlers = require('@cubejs-backend/serverless-aws');
const AthenaDriver = require('@cubejs-backend/athena-driver');
const aws = require('aws-sdk');
console.log(sns); // has the modified local endpoint values
module.exports = new AWSHandlers({...

But if I temporarily modify the module to see what's going on there, it doesn't get updated despite seemingly using the same method.

// @cubejs-backend/serverless-aws/index.js
const Handlers = require('@cubejs-backend/serverless/Handlers');
const aws = require('aws-sdk');
const sns = new aws.SNS();
console.log(sns); // still has the real AWS endpoint values

So if I do this just to ensure that the sns.publish() is using sns scoped from my module with the updated local SNS endpoint, it gets a bit closer to working.

const AWSHandlers = require('@cubejs-backend/serverless-aws');
const AthenaDriver = require('@cubejs-backend/athena-driver');

const aws = require('aws-sdk');
const sns = new aws.SNS();

class LocalAWSHandlers extends AWSHandlers {
  async sendNotificationMessage(message, type, context) {
    const params = {
      Message: JSON.stringify({ message, type, context }),
      TopicArn: this.topicArn(`${process.env.CUBEJS_APP || 'cubejs'}-process`),
    };
    await sns.publish(params).promise();
  }
}

module.exports = new LocalAWSHandlers({...

But this seems to infinitely loop while waiting

offline: ANY /cubejs-api/v1/load (λ: cubejs)
🔓 Authentication checks are disabled in developer mode. Please use NODE_ENV=production to enable it.
🦅 Dev environment available at http://localhost:4000
(node:94184) Warning: Accessing non-existent property 'INVALID_ALT_NUMBER' of module exports inside circular dependency
(node:94184) Warning: Accessing non-existent property 'INVALID_ALT_NUMBER' of module exports inside circular dependency
(node:94184) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 uncaughtException listeners added to [process]. Use emitter.setMaxListeners() to increase limit
Serverless: DEBUG[serverless-offline-sns][server]: hello request
Serverless: DEBUG[serverless-offline-sns][server]: {"Action":"Publish","Message":"{\"message\":\"ae3ef2f3c85db12839b60f972c3e5caf\",\"type\":\"queryProcess\",\"context\":{\"authInfo\":[],\"requestId\":\"c01ff1db-423e-4245-8109-0fe41819f8db\",\"dataSource\":\"default\"}}","TopicArn":"arn:aws:sns:eu-west-1:XXXXXXXXXXXX:analytics-local-process","Version":"2010-03-31"}
Serverless: DEBUG[serverless-offline-sns][server]: [{"SubscriptionArn":"arn:aws:sns:eu-west-1:123456789012:analytics-local-process:103146","Protocol":"http","TopicArn":"arn:aws:sns:eu-west-1:123456789012:analytics-local-process","Endpoint":"http://127.0.0.1:4005/analytics-local-cubejsProcess","Owner":"","Attributes":{}}]
Serverless: DEBUG[serverless-offline-sns][server]: [{"SubscriptionArn":"arn:aws:sns:eu-west-1:123456789012:analytics-local-process:103146","Protocol":"http","TopicArn":"arn:aws:sns:eu-west-1:123456789012:analytics-local-process","Endpoint":"http://127.0.0.1:4005/analytics-local-cubejsProcess","Owner":"","Attributes":{}}]
Serverless: DEBUG[serverless-offline-sns][server]: hello request
Serverless: DEBUG[serverless-offline-sns][server]: {"Action":"Publish","Message":"{\"message\":\"ae3ef2f3c85db12839b60f972c3e5caf\",\"type\":\"queryProcess\",\"context\":{\"authInfo\":[],\"requestId\":\"c01ff1db-423e-4245-8109-0fe41819f8db\",\"dataSource\":\"default\"}}","TopicArn":"arn:aws:sns:eu-west-1:XXXXXXXXXXXX:analytics-local-process","Version":"2010-03-31"}
Serverless: DEBUG[serverless-offline-sns][server]: [{"SubscriptionArn":"arn:aws:sns:eu-west-1:123456789012:analytics-local-process:103146","Protocol":"http","TopicArn":"arn:aws:sns:eu-west-1:123456789012:analytics-local-process","Endpoint":"http://127.0.0.1:4005/analytics-local-cubejsProcess","Owner":"","Attributes":{}}]
Serverless: DEBUG[serverless-offline-sns][server]: [{"SubscriptionArn":"arn:aws:sns:eu-west-1:123456789012:analytics-local-process:103146","Protocol":"http","TopicArn":"arn:aws:sns:eu-west-1:123456789012:analytics-local-process","Endpoint":"http://127.0.0.1:4005/analytics-local-cubejsProcess","Owner":"","Attributes":{}}]

And the client receives

{"error":"Continue wait"}

These lines are particularly interesting and the first two are because of node 14 (i'll try on 12 shortly); but the last one may be the cause perhaps?

(node:94184) Warning: Accessing non-existent property 'INVALID_ALT_NUMBER' of module exports inside circular dependency
(node:94184) Warning: Accessing non-existent property 'INVALID_ALT_NUMBER' of module exports inside circular dependency
(node:94184) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 uncaughtException listeners added to [process]. Use emitter.setMaxListeners() to increase limit

I also noticed that most of the sns ARNs in that log use 123456789012, but two of them use my real account ID (which I had set as an env var). I guess that two things are generating SNS ARNs differently too?

I'm partly just documenting what I'm trying here but also hoping someone may shed some light on getting this to work.

benswinburne commented 4 years ago

I also noticed that most of the sns ARNs in that log use 123456789012, but two of them use my real account ID (which I had set as an env var). I guess that two things are generating SNS ARNs differently too?

Setting the account ID works in terms of making them both use the same one.

custom:
  serverless-offline-sns:
    accountId: 123456789012

Unsure why this is without further investigation, but this seems to work.

Here's a working serverless.yml

service: analytics

provider:
  name: aws
  runtime: nodejs12.x
  region: ${opt:region, 'eu-west-1'}
  stage: ${env:CIRCLE_BRANCH, opt:stage, "local"}

  memorySize: 512
  endpointType: regional
  packageManager: yarn

  iamRoleStatements:
    - Effect: 'Allow'
      Action:
        - 'sns:*'
        # Athena permissions - these need the scope reducing massively
        - 'athena:*'
        - 's3:*'
        - 'glue:*'
      Resource: '*'

  vpc:
    securityGroupIds: [!Ref ServerlessSecurityGroup]
    subnetIds:
      - subnet-id-1
      - subnet-id-2

  environment:
    CUBEJS_DB_TYPE: athena
    CUBEJS_AWS_KEY: ${env:CUBEJS_AWS_KEY}
    CUBEJS_AWS_SECRET: ${env:CUBEJS_AWS_SECRET}
    CUBEJS_AWS_REGION: ${env:CUBEJS_AWS_REGION}
    CUBEJS_AWS_S3_OUTPUT_LOCATION: ${env:CUBEJS_AWS_S3_OUTPUT_LOCATION}
    CUBEJS_API_SECRET: ${env:CUBEJS_API_SECRET}
    NODE_ENV: ${env:NODE_ENV, self:provider.stage}
    REDIS_URL: ${env:REDIS_URL, self:custom.REDIS_URL}
    AWS_ACCOUNT_ID: ${env:AWS_ACCOUNT_ID, self:custom.AWS_ACCOUNT_ID}
    # Do not change. https://github.com/cube-js/cube.js/issues/1014
    CUBEJS_APP: '${self:service.name}-${self:provider.stage}'

package:
  individually: false
  excludeDevDependencies: true
  exclude:
    - node_modules/aws-sdk/**

functions:
  cubejs:
    handler: cube.api
    timeout: 30
    events:
      - http:
          path: /
          method: GET
          cors: true
      - http:
          path: /{proxy+}
          method: ANY
          cors: true
  cubejsProcess:
    handler: cube.process
    timeout: 630
    events:
      # Do not change. https://github.com/cube-js/cube.js/issues/1014
      - sns: '${self:service.name}-${self:provider.stage}-process'

custom:
  config:
    CACHE_INSTANCE_SIZE: cache.t3.micro
  serverless-offline:
    httpPort: 5005
    lambdaPort: 3005
    noPrependStageInUrl: true
  serverless-offline-sns:
    port: 4005
    accountId: 123456789012
  REDIS_URL:
    Fn::Join:
      - ''
      - - 'redis://'
        - Fn::GetAtt: [ElasticCacheCluster, RedisEndpoint.Address]
  AWS_ACCOUNT_ID:
    Fn::Join:
      - ''
      - - Ref: 'AWS::AccountId'

resources:
  Resources:
    ServerlessSecurityGroup:
      Type: AWS::EC2::SecurityGroup
      Properties:
        GroupDescription: SecurityGroup for Serverless Functions
        VpcId: vpc-id

    ServerlessStorageSecurityGroup:
      Type: AWS::EC2::SecurityGroup
      Properties:
        GroupDescription: Ingress for Redis Cluster
        VpcId: vpc-id
        SecurityGroupIngress:
          - IpProtocol: tcp
            FromPort: '6379'
            ToPort: '6379'
            SourceSecurityGroupId:
              Ref: ServerlessSecurityGroup

    ServerlessCacheSubnetGroup:
      Type: AWS::ElastiCache::SubnetGroup
      Properties:
        Description: 'Cache Subnet Group'
        SubnetIds:
          - subnet-id-1
          - subnet-id-2

    ElasticCacheCluster:
      DependsOn: ServerlessStorageSecurityGroup
      Type: AWS::ElastiCache::CacheCluster
      Properties:
        AutoMinorVersionUpgrade: true
        Engine: redis
        CacheNodeType: ${self:custom.config.CACHE_INSTANCE_SIZE}
        NumCacheNodes: 1
        VpcSecurityGroupIds:
          - 'Fn::GetAtt': ServerlessStorageSecurityGroup.GroupId
        CacheSubnetGroupName:
          Ref: ServerlessCacheSubnetGroup

plugins:
  - serverless-dotenv-plugin
  - serverless-offline-sns
  - serverless-offline
  - serverless-express

And a handler

const AWSHandlers = require('@cubejs-backend/serverless-aws');
const AthenaDriver = require('@cubejs-backend/athena-driver');

const aws = require('aws-sdk');
const sns = new aws.SNS();

// https://github.com/cube-js/cube.js/issues/121
class LocalAWSHandlers extends AWSHandlers {
  async sendNotificationMessage(message, type, context) {
    const params = {
      Message: JSON.stringify({ message, type, context }),
      TopicArn: this.topicArn(`${process.env.CUBEJS_APP || 'cubejs'}-process`),
    };
    await sns.publish(params).promise();
  }
}

module.exports = new LocalAWSHandlers({
  externalDbType: 'athena',
  externalDriverFactory: () =>
    new AthenaDriver({
      accessKeyId: process.env.CUBEJS_AWS_KEY,
      secretAccessKey: process.env.CUBEJS_AWS_SECRET,
      region: process.env.CUBEJS_AWS_REGION,
      S3OutputLocation: process.env.CUBEJS_AWS_S3_OUTPUT_LOCATION,
    }),
});