github / rally

GitHub <> Rally integration
MIT License
123 stars 33 forks source link

Trying to resolve "connection reset by peer" on delivery issue #364

Closed jcreinhold closed 2 years ago

jcreinhold commented 2 years ago

Overview

I'm trying to deploy this at my work using AWS Fargate (simpler deployment options, like AWS Lambda or a local server, aren't viable for obscure reasons). For reference, I'm using the Docker deployment scheme; I provided the Dockerfile I'm using below.

When I open a browser and go to the IP address of the Fargate task running this app, it shows the probot landing page.

However, on the GitHub app end, I'm getting "connection reset by peer" on the deliveries (as shown in the "Advanced" tab of the app).

Potential problems

  1. There is a firewall issue (e.g., the app is trying to establish a connection to the enterprise GitHub on some port other than 80 or 443, or is not using TCP)
  2. The default node.js settings of keepAliveTimeout and headersTimeout are too short.

Questions for dev team

  1. Is the app trying to establish a connection on a port other than 80 and 443 or a different protocol?
  2. Is there a way to modify this application to change the values of keepAliveTimeout and headersTimeout without a major rewrite of the code base?

Environment and additional context

The URLs that are listed in the delivery (under "Advanced" in the Rally GitHub app) appear to be correctly formatted (I can follow them in my browser and get a response back from our enterprise GitHub server).

The security groups of the Fargate task are configured to allow traffic to the enterprise GitHub server on port 80 and 443 (hence why I'm able to see the probot landing page), and I started the app on port 80 (so the webhook and homepage URLs are just the IP address). I'm not using a load balancer currently, although I did previously to the same effect. (In that case, I mapped TCP traffic from port 80 to port 3000 and started the app on port 3000.)

Dockerfile

Note that the GitHub app ID (APP_ID), Rally API key (RALLY_API_KEY), and webhook secret (WEBHOOK_SECRET) are added into the environment during runtime. rally-github-config exists as a repo under ${organization} on the enterprise GitHub server with a .github/rally.yml directory/file with the workspace OID and projects set.

# syntax=docker/dockerfile:1
FROM node:14-alpine AS builder-base
LABEL maintainer="Jacob Reinhold"

FROM builder-base AS builder
USER root
WORKDIR /opt

# hadolint ignore=DL3018
RUN \
  apk add --no-cache git && \
  git clone -b v1.2.1 --depth 1 https://github.com/github/rally.git

USER node

FROM node:14-alpine AS final

COPY --chown=node:node "rally.pem" "/opt/rally/.ssh/rally.pem"
COPY --chown=node:node --from=builder "/opt/rally/package.json" "/opt/rally/package.json"
COPY --chown=node:node --from=builder "/opt/rally/index.js" "/opt/rally/index.js"
COPY --chown=node:node --from=builder "/opt/rally/lib" "/opt/rally/lib"

USER root
# hadolint ignore=DL3018
RUN \
  apk add --no-cache libcap make python3 && \
  setcap cap_net_bind_service=+ep "$(readlink -f "$(which node)")"

USER node
WORKDIR /opt/rally
EXPOSE 80
EXPOSE 443
# https://probot.github.io/docs/configuration/
ENV BOT_NAME="Rally"
ENV GHE_HOST="github.mycompanyname.com"
ENV GHE_PROTOCOL="https"
ENV LOG_LEVEL="debug"
ENV NODE_ENV="production"
ENV ORG_CONFIG_REPO_NAME="rally-github-config"
ENV PORT=80
ENV PRIVATE_KEY_PATH="/opt/rally/.ssh/rally.pem"
ENV RALLY_SERVER="https://rally1.rallydev.com"

ARG organization
ENV GH_ORG=${organization}

RUN npm install

# Does not start properly when using the exec form of CMD
# hadolint ignore=DL3025
CMD npm start
jcreinhold commented 2 years ago

I'm fairly confident this is a firewall issue, perhaps related to SSL/TLS certificates. I'll re-open this issue if it turns out to be an issue with the app.

If you have any guidance though—or know of any common problems when setting up probot apps that sound similar to this issue—it would be appreciated 😅