SNS topics available to Worker Service

gabrieljoelc commented 1 year ago

I have 1 application and 2 services, that use the same code entrypoint (monolith application): Web app:

# web/manifest.yml
name: web
type: Load Balanced Web Service

publish:
  topics:
    - name: my-topic

Subscribers workers:

# subs/manifest.yml
name: subs
type: Worker Service

subscribe:
  topics:
    - name: my-topic
      service: web

The way the documentation reads is the ARN convention should be:

arn:aws:sns:us-east-1:11111111111111:my-topic

but in reality it is arn:aws:sns:us-east-1:11111111111111:<application name>-<environment name>-<service name>-<topic name> like:

arn:aws:sns:us-east-1:11111111111111:myapplicationname-dev-web-my-topic

My code handlers are in the same application and do something like:

// subscribers/my-topic-handler.js

export const topic = 'my-topic';
export function handler(data) {
  // process message
}

I have some middleware that routes the message.TopicArn to the correct handler. In the web service, I have the COPILOT_SNS_TOPIC_ARNS variable, but I have discovered there is no way to infer the topic because:

I don't have COPILOT_SNS_TOPIC_ARNS
There is no environment variable that exposes the subscribe.topic.service to split of last part of the TopicArn

I am working around this by hard coding the service name in the subscribers/my-topic-handler.js as well but I feel like the application code shouldn't have to do that.

bvtujo commented 1 year ago

As far as I understand it, your problem is that you want to process messages based only on the name of the SNS topic you specified in the Copilot publish manifest, not any of the other data like which service, env, or app it came from. Having app-env-pub1-my-topic routed to a different handler from app-env-pub2-my-topic does not work for you.

Copilot typically tries to keep services decoupled by only exposing resources of one service to that service's task role. Pub/sub is an exception to that, where the Worker Service needs at some point to know about the SNS topics it's subscribing to. However, all that logic is handled when generating CloudFormation templates.

Our idea when designing worker services was to keep them isolated--that is, the tasks in the service shouldn't know or care what SNS topics the queue is subscribed to; they should solely focus on processing messages based on their characteristics or which queue they come from.

That's why services with publish have environment variables containing the SNS topics they need to send messages to, and worker services have environment variables containing the URIs of the queues they're in charge of processing. So it seems like your use case is particularly thorny for Copilot to handle out of the box.

Passing ARNs in as environment variables isn't a great solution, as you know, since it's not particularly portable between environments.

Would it possible for you to use topic-specific queues and do the routing that way? This would avoid needing to generate a mapping of SNS topic ARN to message handler, but has the tradeoff of requiring you to check multiple queues in your event loop or run separate threads for each queue.

subscribe:
  topics:
  - name: my-topic
    service: web
    queue: true

Alternatively, you could maintain a list of copilot services that publish messages in the environment variables of your worker service. This is a little fragile, unfortunately:

subscribe:
  topics:
  - name: my-topic
    service: web
  - name: my-topic
    service: pub1
  - name: my-other-topic
    service: pub2

variables:
  SUBSCRIBED_SERVICES: '{
    "my-topic": ["web", "pub1"},
    "my-other-topic": ["pub2"]
  }'

but would allow you, in conjunction with the COPILOT_APPLICATION_NAME AND COPILOT_ENVIRONMENT_NAME default environment variables, to fully parse the SNS topic ARNs like you need to.

Does any of this help?

gabrieljoelc commented 1 year ago

Early on, I could tell that Copilot worked better when creating a Worker Service per subscriber rather than for monolithic subscribers. This leaning is fine in full microservice design. However, for early stage startups with small teams, applications often start as monolithic (and then are decomposed later as load characteristics and features cause the design to emerge and be refactored). Being able to build a monolithic "subscriber" service with atomic "handlers" per topic would be very helpful in this case.

Alternate solution 1 - handler per queue

This is the best workaround so I would infer the topic from the COPILOT_TOPIC_QUEUE_URIS? Consider this application code:

// in web / Load Balanced Web Service handling an http POST
async function handlerHttp(body) {
  const { id } = body;

  // NOTE: application bootstrap code builds up `container.publishers` object from `COPILOT_SNS_TOPIC_ARNS`
  await container.publishers['my-topic'].publish({ id });
}

// in subs / Worker Service
async function sqsMiddleware(messages) {
  for (const message of messages) {
    const { Body, TopicArn } = message;

    // NOTE: application bootstrap code builds up `container.handlers` from `COPILOT_TOPIC_QUEUE_URIS`?
    await container.handlers[TopicArn];
  }
}

One question: why are the property keys in COPILOT_TOPIC_QUEUE_URIS camelcase but the ones in COPILOT_SNS_TOPIC_ARNS are dash-cased? This will make inference more difficult.

To be fair, I haven't configured a per-topic queue yet, and the documentation for COPILOT_SNS_TOPIC_ARNS doesn't show the arn naming correctly. Maybe IRL, the casing is dashes and the documentation is wrong here?

Alternate solution 2 - hardcode topics in custom environment variable

Yes, I considered this and decided against for the same reason you gave: it's fragile.

Alternate solution 3 - API Gateway

I realize an API Gateway would help here, but that's not easily attainable if you're a small team is trying to do all DevOps via Copliot).

bvtujo commented 1 year ago

Yes, copilot always uses the app-env-svc-topic naming convention for resources when possible to avoid collisions between services, so you can have different services each exposing an SNS topic with the same name. We don't view them as resources that should necesssarily be shared, just as means to an end for getting messages into the destination queue.

The reason for the differing JSON keys is that unfortunately Topics have a TopicName field which can contain hyphens, but Queues are identified in part by their CFN logical ID, which can't contain hyphens. It was a design decision we took which resulted in us using similar names for the Queue keys as their logical IDs, when it would have perhaps been better to name them things like my-service-topic-name-queue for parity with publish.

bvtujo commented 1 year ago

You can always look at what actual variables are exposed to a given deployed service via copilot svc show.

aws / copilot-cli