apollographql / rover

The CLI for Apollo GraphOS
https://rover.apollo.dev
Other
408 stars 85 forks source link

Bug: Router randomly disconnecting with enterprise features #1621

Closed WolffRuoff closed 1 year ago

WolffRuoff commented 1 year ago

Description

Whenever I use the rover dev command, after 30 seconds or so the router will seemingly crash but rover won't say anything in the terminal.

The Sandbox will have this message appear: image

even though the terminal that ran the command still looks like this: image

This command recommended returns:

⚠️  Connection refused (ECONNREFUSED)
Is the address correct?
Is the server running?

When run in debug mode there are no new logs to indicate something happened. I would try to debug this further but I'm not sure how to go about viewing the router logs when it is composed via the rover dev command (if this isn't possible this would be a nice feature)

Steps to reproduce

Set up a router.yaml like so:

supergraph:
  listen: 0.0.0.0:9999
  path: /router/graphql
  introspection: true
authentication:
  jwt:
    jwks:
      - url: SAMPLE
    header_name: Authorization
    header_value_prefix: Bearer
health_check:
  listen: 0.0.0.0:9999
  enabled: true
headers:
  all:
    request:
      - propagate:
          matching: .*
traffic_shaping:
  router:
    timeout: 9s
  all:
    deduplicate_query: true 
    compression: gzip
include_subgraph_errors:
  all: true
sandbox:
  enabled: true
homepage:
  enabled: false

and run the command rover dev --router-config ./router.yaml --name User --schema User.graphql --url http://localhost:3009/user/graphql --elv2-license=accept with your equivalent subgraph

Expected result

The sandbox should work and receive queries

Actual result

After 30 seconds or so it disconnects and every query has this error:

{
  "data": {},
  "error": {
    "message": "Failed to fetch",
    "stack": "TypeError: Failed to fetch\n    at handleRequest.o.legacyIncludeCookies (https://embeddable-sandbox.cdn.apollographql.com/_latest/embeddable-sandbox.umd.production.min.js?runtime=apollo-router@1.19.1:1:64174)\n    at https://embeddable-sandbox.cdn.apollographql.com/_latest/embeddable-sandbox.umd.production.min.js?runtime=apollo-router@1.19.1:1:11682\n    at Generator.next (<anonymous>)\n    at https://embeddable-sandbox.cdn.apollographql.com/_latest/embeddable-sandbox.umd.production.min.js?runtime=apollo-router@1.19.1:1:774\n    at new Promise (<anonymous>)\n    at s (https://embeddable-sandbox.cdn.apollographql.com/_latest/embeddable-sandbox.umd.production.min.js?runtime=apollo-router@1.19.1:1:519)\n    at H (https://embeddable-sandbox.cdn.apollographql.com/_latest/embeddable-sandbox.umd.production.min.js?runtime=apollo-router@1.19.1:1:10560)\n    at c (https://embeddable-sandbox.cdn.apollographql.com/_latest/embeddable-sandbox.umd.production.min.js?runtime=apollo-router@1.19.1:1:67165)"
  }
}

Environment

Run rover info and paste the results here

Rover Info:
Version: 0.14.2
Install Location: /Users/ruoffe/.rover/bin/rover
OS: Mac OS 13.1.0 [64-bit]
Shell: /bin/zsh

Additionally we're on router 1.19.1

WolffRuoff commented 1 year ago

I updated to Rover 0.15.0 so that I could view the router logs and here is the trace when it disconnects:

   TRACE rover::command::dev::router::runner: flushing buffer
    at src/command/dev/router/runner.rs:201

  TRACE rover::command::dev::router::runner: drop_stream_ref; stream=Stream { id: StreamId(1), state: State { inner: Closed(EndStream) }, is_counted: false, ref_count: 2, next_pending_send: None, is_pending_send: false, send_flow: FlowControl { window_size: Window(64824), available: Window(0) }, requested_send_capacity: 0, buffered_send_data: 0, send_task: Some(Waker { data: 0x7fed10b95af0, vtable: 0x1091bb820 }), pending_send: Deque { indices: None }, next_pending_send_capacity: None, is_pending_send_capacity: false, send_capacity_inc: true, next_open: None, is_pending_open: false, is_pending_push: false, next_pending_accept: None, is_pending_accept: false, recv_flow: FlowControl { window_size: Window(2097043), available: Window(2097043) }, in_flight_recv_data: 109, next_window_update: None, is_pending_window_update: false, reset_at: None, next_reset_expire: None, pending_recv: Deque { indices: Some(Indices { head: 1, tail: 2 }) }, is_recv: true, recv_task: None, pending_push_promises: Queue { indices: None, _p: PhantomData<h2::proto::streams::stream::NextAccept> }, content_length: Remaining(0) }
    at src/command/dev/router/runner.rs:201

  TRACE rover::command::dev::router::runner: transition_after; stream=StreamId(1); state=State { inner: Closed(EndStream) }; is_closed=true; pending_send_empty=true; buffered_send_data=0; num_recv=0; num_send=0
    at src/command/dev/router/runner.rs:201

  DEBUG rover::command::dev::router::runner: uplink response Response { url: Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("uplink.api.apollographql.com")), port: None, path: "/", query: None, fragment: None }, status: 200, headers: {"content-type": "application/json;charset=utf-8", "function-execution-id": "nbqdn1w45qqc", "x-cloud-trace-context": "1f4aa5df0bf53f7bcbf2cc4dec696ea1", "date": "Mon, 12 Jun 2023 15:03:32 GMT", "server": "Google Frontend", "content-length": "109", "via": "1.1 google", "alt-svc": "h3=\":443\"; ma=2592000,h3-29=\":443\"; ma=2592000"} }
    at src/command/dev/router/runner.rs:200

  TRACE rover::command::dev::router::runner: release_capacity; size=109
    at src/command/dev/router/runner.rs:201

  TRACE rover::command::dev::router::runner: release_connection_capacity; size=109, connection in_flight_data=109
    at src/command/dev/router/runner.rs:201

  TRACE rover::command::dev::router::runner: release_capacity; size=0
    at src/command/dev/router/runner.rs:201

  TRACE rover::command::dev::router::runner: release_connection_capacity; size=0, connection in_flight_data=0
    at src/command/dev/router/runner.rs:201

  TRACE rover::command::dev::router::runner: drop_stream_ref; stream=Stream { id: StreamId(1), state: State { inner: Closed(EndStream) }, is_counted: false, ref_count: 1, next_pending_send: None, is_pending_send: false, send_flow: FlowControl { window_size: Window(64824), available: Window(0) }, requested_send_capacity: 0, buffered_send_data: 0, send_task: Some(Waker { data: 0x7fed10b95af0, vtable: 0x1091bb820 }), pending_send: Deque { indices: None }, next_pending_send_capacity: None, is_pending_send_capacity: false, send_capacity_inc: true, next_open: None, is_pending_open: false, is_pending_push: false, next_pending_accept: None, is_pending_accept: false, recv_flow: FlowControl { window_size: Window(2097043), available: Window(2097152) }, in_flight_recv_data: 0, next_window_update: None, is_pending_window_update: false, reset_at: None, next_reset_expire: None, pending_recv: Deque { indices: None }, is_recv: false, recv_task: None, pending_push_promises: Queue { indices: None, _p: PhantomData<h2::proto::streams::stream::NextAccept> }, content_length: Remaining(0) }
    at src/command/dev/router/runner.rs:201

  TRACE rover::command::dev::router::runner: transition_after; stream=StreamId(1); state=State { inner: Closed(EndStream) }; is_closed=true; pending_send_empty=true; buffered_send_data=0; num_recv=0; num_send=0
    at src/command/dev/router/runner.rs:201

  TRACE rover::command::dev::router::runner: client::dispatch::Sender dropped
    at src/command/dev/router/runner.rs:201

  TRACE rover::command::dev::router::runner: {"timestamp":"2023-06-12T15:03:31.966885Z","level":"TRACE","connection.state":"Open","span":{"name":"poll"},"spans":[{"peer":"Client","name":"Connection"},{"name":"poll"}]}
    at src/command/dev/router/runner.rs:201

  DEBUG rover::command::dev::router::runner: uplink response did not change
    at src/command/dev/router/runner.rs:200

  TRACE rover::command::dev::router::runner: signal: Closed
    at src/command/dev/router/runner.rs:201

  TRACE rover::command::dev::router::runner: pool closed, canceling idle interval
    at src/command/dev/router/runner.rs:201

  TRACE rover::command::dev::router::runner: poll
    at src/command/dev/router/runner.rs:201

  TRACE rover::command::dev::router::runner: poll_complete
    at src/command/dev/router/runner.rs:201

  TRACE rover::command::dev::router::runner: schedule_pending_open
    at src/command/dev/router/runner.rs:201

  TRACE rover::command::dev::router::runner: flushing buffer
    at src/command/dev/router/runner.rs:201

  TRACE rover::command::dev::router::runner: send_request dropped, starting conn shutdown
    at src/command/dev/router/runner.rs:201

  TRACE rover::command::dev::router::runner: {"timestamp":"2023-06-12T15:03:31.966974Z","level":"TRACE","connection.state":"Open","span":{"name":"poll"},"spans":[{"peer":"Client","name":"Connection"},{"name":"poll"}]}
    at src/command/dev/router/runner.rs:201

  DEBUG rover::command::dev::router::runner: send
    at src/command/dev/router/runner.rs:200

  TRACE rover::command::dev::router::runner: encoding GO_AWAY; code=NO_ERROR
    at src/command/dev/router/runner.rs:201

  TRACE rover::command::dev::router::runner: encoded go_away
    at src/command/dev/router/runner.rs:201

  DEBUG rover::command::dev::router::runner: Connection::poll; connection error
    at src/command/dev/router/runner.rs:200

  TRACE rover::command::dev::router::runner:     -> already going away
    at src/command/dev/router/runner.rs:201

  TRACE rover::command::dev::router::runner: {"timestamp":"2023-06-12T15:03:31.967037Z","level":"TRACE","connection.state":"Closing(NO_ERROR, Library)","span":{"name":"poll"},"spans":[{"peer":"Client","name":"Connection"},{"name":"poll"}]}
    at src/command/dev/router/runner.rs:201

  TRACE rover::command::dev::router::runner: connection closing after flush
    at src/command/dev/router/runner.rs:201

  TRACE rover::command::dev::router::runner: {"timestamp":"2023-06-12T15:03:31.967050Z","level":"TRACE","queued_data_frame":false,"span":{"name":"FramedWrite::flush"},"spans":[{"peer":"Client","name":"Connection"},{"name":"poll"},{"name":"FramedWrite::flush"}]}
    at src/command/dev/router/runner.rs:201

  TRACE rover::command::dev::router::runner: flushing buffer
    at src/command/dev/router/runner.rs:201

  DEBUG rover::command::dev::router::runner: Sending warning alert CloseNotify
    at src/command/dev/router/runner.rs:200

  TRACE rover::command::dev::router::runner: {"timestamp":"2023-06-12T15:03:31.967151Z","level":"TRACE","connection.state":"Closed(NO_ERROR, Library)","span":{"name":"poll"},"spans":[{"peer":"Client","name":"Connection"},{"name":"poll"}]}
    at src/command/dev/router/runner.rs:201

  TRACE rover::command::dev::router::runner: Streams::recv_eof
    at src/command/dev/router/runner.rs:201

  TRACE rover::command::dev::router::runner: deregistering event source from poller
    at src/command/dev/router/runner.rs:201

  TRACE hyper::client::pool: idle interval checking for expired
    at /Users/distiller/.cargo/registry/src/index.crates.io-6f17d22bba15001f/hyper-0.14.26/src/client/pool.rs:767
WolffRuoff commented 1 year ago

I tested running rover dev with pieces of the router.yaml we're using, and it only appears to break when we're using the enterprise JWT plugin.

EDIT: To add some additional context:

dbanty commented 1 year ago

Thanks for reporting @WolffRuoff! I think I've actually seen this bug as well 👀. To be clear, the environment variables are correctly enabling enterprise features (it doesn't immediately crash saying you're missing entitlements), but the router stops working after a bit if JWT is enabled?

Are you able to use other enterprise-only features successfully without crashing (e.g., operation limits)?

WolffRuoff commented 1 year ago

Hey @dbanty, it doesn't immediately crash. I just get the panicking error after a bit. As for other enterprise-only features, it seems to work fine with the operation limits setting you linked.