CoorpAcademy / serverless-plugins

Collection of serverless plugins :zap:
231 stars 130 forks source link

ResourceNotFoundException: Invalid ShardId in ShardIterator #154

Open mshick opened 3 years ago

mshick commented 3 years ago

I've been having issues with serverless-offline-dynamodb-streams for as long as I've been using it (6 months on the current project). After 1 - 2 days I get the following error and the stack hangs.

ResourceNotFoundException: Invalid ShardId in ShardIterator
      at Request.extractError (/Users/mshick/Code/takeshape/takeshape/node_modules/.pnpm/aws-sdk@2.787.0/node_modules/aws-sdk/lib/protocol/json.js:52:27)
      at Request.callListeners (/Users/mshick/Code/takeshape/takeshape/node_modules/.pnpm/aws-sdk@2.787.0/node_modules/aws-sdk/lib/sequential_executor.js:106:20)
      at Request.emit (/Users/mshick/Code/takeshape/takeshape/node_modules/.pnpm/aws-sdk@2.787.0/node_modules/aws-sdk/lib/sequential_executor.js:78:10)
      at Request.emit (/Users/mshick/Code/takeshape/takeshape/node_modules/.pnpm/aws-sdk@2.787.0/node_modules/aws-sdk/lib/request.js:688:14)
      at Request.transition (/Users/mshick/Code/takeshape/takeshape/node_modules/.pnpm/aws-sdk@2.787.0/node_modules/aws-sdk/lib/request.js:22:10)
      at AcceptorStateMachine.runTo (/Users/mshick/Code/takeshape/takeshape/node_modules/.pnpm/aws-sdk@2.787.0/node_modules/aws-sdk/lib/state_machine.js:14:12)
      at /Users/mshick/Code/takeshape/takeshape/node_modules/.pnpm/aws-sdk@2.787.0/node_modules/aws-sdk/lib/state_machine.js:26:10
      at Request.<anonymous> (/Users/mshick/Code/takeshape/takeshape/node_modules/.pnpm/aws-sdk@2.787.0/node_modules/aws-sdk/lib/request.js:38:9)
      at Request.<anonymous> (/Users/mshick/Code/takeshape/takeshape/node_modules/.pnpm/aws-sdk@2.787.0/node_modules/aws-sdk/lib/request.js:690:12)
      at Request.callListeners (/Users/mshick/Code/takeshape/takeshape/node_modules/.pnpm/aws-sdk@2.787.0/node_modules/aws-sdk/lib/sequential_executor.js:116:18)
      at Request.emit (/Users/mshick/Code/takeshape/takeshape/node_modules/.pnpm/aws-sdk@2.787.0/node_modules/aws-sdk/lib/sequential_executor.js:78:10)
      at Request.emit (/Users/mshick/Code/takeshape/takeshape/node_modules/.pnpm/aws-sdk@2.787.0/node_modules/aws-sdk/lib/request.js:688:14)
      at Request.transition (/Users/mshick/Code/takeshape/takeshape/node_modules/.pnpm/aws-sdk@2.787.0/node_modules/aws-sdk/lib/request.js:22:10)
      at AcceptorStateMachine.runTo (/Users/mshick/Code/takeshape/takeshape/node_modules/.pnpm/aws-sdk@2.787.0/node_modules/aws-sdk/lib/state_machine.js:14:12)
      at /Users/mshick/Code/takeshape/takeshape/node_modules/.pnpm/aws-sdk@2.787.0/node_modules/aws-sdk/lib/state_machine.js:26:10
      at Request.<anonymous> (/Users/mshick/Code/takeshape/takeshape/node_modules/.pnpm/aws-sdk@2.787.0/node_modules/aws-sdk/lib/request.js:38:9)
      at Request.<anonymous> (/Users/mshick/Code/takeshape/takeshape/node_modules/.pnpm/aws-sdk@2.787.0/node_modules/aws-sdk/lib/request.js:690:12)
      at Request.callListeners (/Users/mshick/Code/takeshape/takeshape/node_modules/.pnpm/aws-sdk@2.787.0/node_modules/aws-sdk/lib/sequential_executor.js:116:18)
      at callNextListener (/Users/mshick/Code/takeshape/takeshape/node_modules/.pnpm/aws-sdk@2.787.0/node_modules/aws-sdk/lib/sequential_executor.js:96:12)
      at IncomingMessage.onEnd (/Users/mshick/Code/takeshape/takeshape/node_modules/.pnpm/aws-sdk@2.787.0/node_modules/aws-sdk/lib/event_listeners.js:313:13)
      at IncomingMessage.emit (events.js:327:22)
      at IncomingMessage.EventEmitter.emit (domain.js:486:12)
      at endReadableNT (_stream_readable.js:1327:12)
      at processTicksAndRejections (internal/process/task_queues.js:80:21)

It sounds very similar to #43 . I reset the DB, and get a working serverless offline stack again. For awhile we were on out-of-date serverless packages, and I saw that PR, so I set about getting everything current, and now have all plugins up-to-date, but still encounter the problem.

I started debugging the issue today, logging in various places, looking for something I could key in on to catch these invalid ShardIds, but found nothing.

I attempted to short circuit the error emitted from read with the code below, and it seemed to get everything working again:

...
    function gotRecords(err, data) {
      // if (err) return checkpoint.emit('error', err);
      if (err) return null
      setTimeout(readable.push.bind(readable), options.readInterval || 500, data.Records);
    }
...

I've tested reading and writing from my local dynamo without issues.

My question then is, why might this be working, and what is the appropriate fix? I can help debug and test, though I am a bit out of my league on the actual solution.

mshick commented 3 years ago

Pasting some relevant sections of my serverless.yml here, in case they offer any clues...

custom:
  streams:
    roles: ${self:custom.resources.dynamoTables.roles.LatestStreamArn, 'arn:aws:dynamodb:ddblocal:000000000000:table/${self:custom.projectName}.dev.roles/stream/2019-08-14T18:57:07.218'}
  serverless-offline:
    noPrependStageInUrl: true
    useWorkerThreads: true
    allowCache: true
  serverless-offline-dynamodb-streams:
    endpoint: http://0.0.0.0:8000
    region: us-east-1
    accessKeyId: root
    secretAccessKey: root
    skipCacheInvalidation: true
    readInterval: 500

plugins:
  - serverless-domain-manager
  - serverless-plugin-warmup
  - serverless-api-compression
  - serverless-webpack
  - serverless-offline-dynamodb-streams
  - serverless-offline-sns
  - serverless-offline
  - serverless-plugin-split-stacks
  - serverless-sentry

functions:
  introspectionCache:
    handler: src/functions/introspection-cache/handler.handler
    timeout: 60
    events:
      - stream:
          type: dynamodb
          batchSize: 10
          arn: ${self:custom.resources.dynamoTables.schema.LatestStreamArn}
          startingPosition: LATEST
mdrijwan commented 3 years ago

facing the same issue

mattjennings commented 3 years ago

I was having this issue using the official DynamoDB Local docker container. The shard error seems to be related to it (I encountered the same error trying to use streams outside of serverless-offline entirely).

I switched to LocalStack and streams are working now. For anyone using docker-compose, this is my config

services:
  localstack:
    image: localstack/localstack
    ports:
      - '4566:4566'
      - '4571:4571'
      - '8000:4566' # optional - exposes edge port on 8000 as well since it's the common dynamodb port
    environment:
      - SERVICES=s3,sns,sqs,apigateway,lambda,dynamodb,dynamodbstreams,cloudformation
    volumes:
      - '${TMP_DIR:-/tmp/localstack}:${TMP_DIR:-/tmp/localstack}'
      - '/var/run/docker.sock:/var/run/docker.sock'
dmitriy-baltak commented 3 years ago

Same issue here, but we are not using docker to start the dynamodb, we just follow advice from the docs at the moment and use serverless dynamodb start --migrate command to start it, looks like we will have to start it in a docker with localstack image instead.

ondrejrohon commented 2 years ago

I had the same issue and restarting mac os helped 😂

martinjuhasz commented 1 year ago

Running into the same issue. unable to resolve currently.

// Edit: What seemed to help in my case was to remove the docker volume (for the data, not the image) for opensearch/elasticsearch and recreate it.

voccer-pionero commented 1 year ago

Does anyone have idea?