feat: deploy and test hub network on aws

varunsrin commented 2 years ago

What is the feature you would like to implement? Deploy 3 Hubs on AWS and have them gossip messages to each other and measure the success rates of message transfer when gossiping a large volume of messages

Why is this feature important? Important to get this done early to start battle testing p2p implementations

Will the protocol spec need to be updated?? No

How should this feature be built?

[x] deploy single hub on aws
[ ] upgrade bench to include multiple messages for users (@sagar-a16z to write spec for messages to ensure they test edge cases)
[ ] run bench against single hub and verify consistency of state based on inputs
[ ] deploy 10 hubs to AWS
[ ] write test script to verify eventual consistency: take input (users + messages) + output (final state) + orchestrate starting hubs, playing messages and verifying that state is reached correctly
[ ] write several test scenarios (@sagar-a16z to write spec) and run them against hubs

sagar-a16z commented 2 years ago

@varunsrin, I'd like to describe some high level expectations from this test as well as the tools we have available.

High level overview

Phase 1

Run Hub 1 on AWS (with Simple Sync disabled)
Use the Benchmark client to send some messages to the Hub from your local machine (or another aws instance) a. This client uses RPC APIs to submit numerous requests to the Hub
Verify no errors occurred (client will error out if the Hub fails to handle anything)

Phase 2

Start Hub 2 on AWS and bootstrap it off of Hub 1
Look for "Sync Progress" and "Sync completed" in Hub 2's logs showing that Simple Sync completed successfully a. Example: 12D3KooWLGB5erXP4H1AyKXw8G8RxZ6b2nCjaTG7cqNLfAg32sS3 Sync Progress( Fid: 73128 ): Merged: 320 messages and failed 0 messages b. Example: 12D3KooWLGB5erXP4H1AyKXw8G8RxZ6b2nCjaTG7cqNLfAg32sS3 Sync completed
Use the Benchmark client to send some messages to either Hub 1.
Manually verify no errors occur in Hub 2's logs. (client doesn't currently verify other Hubs on the network, so this step is manual)

Given the capability of the Benchmark client right now and the lack of snapshot sync, this is what we can run to test a network of Hubs. We should update the Benchmark client to keep track of a list of Hubs in the network. After it runs, it can manually verify (over RPC) that every Hub has reached the same state. But it doesn't presently do this.

How to run a Hub

Create an Identity for the Hub before starting it
```
$ yarn identity create
```

Successfully Wrote peerId: 12D3KooWFamBRXiLTBNWRCSkSVwctXGxcXiYsVhgYcDcPeY6hYfW to ./.hub/id.protobuf

2. Start Hub 1 with Simple Sync disabled (since this is the first Hub in the network, it has no where to sync from and we don't want it to Simple Sync with a future peer.

$ yarn start --simple-sync false

12D3KooWDv4za5JHxCQRYaJADFncW9qZpRv1L3QtkKAJEnLge1Hi LibP2P started... listening on addresses: /ip4/127.0.0.1/tcp/52474/p2p/12D3KooWDv4za5JHxCQRYaJADFncW9qZpRv1L3QtkKAJEnLge1Hi RPC server started: { address: '::', family: 'IPv6', port: 52475 }

3. Start Hub 2 with similar options but bootstrap it off of Hub 1 by using Hub 1's listening address.

yarn start -B '/ip4/<HUB_1_PUBLIC_IP/tcp/52474/p2p/12D3KooWDv4za5JHxCQRYaJADFncW9qZpRv1L3QtkKAJEnLge1Hi'

12D3KooWSdZj972QcysLSHFhFZduFMzz9bPRKpPKDYTiJ1HWEKhZ LibP2P started... listening on addresses: /ip4/127.0.0.1/tcp/57384/p2p/12D3KooWSdZj972QcysLSHFhFZduFMzz9bPRKpPKDYTiJ1HWEKhZ 12D3KooWSdZj972QcysLSHFhFZduFMzz9bPRKpPKDYTiJ1HWEKhZ Attempting to connect to address: /ip4/127.0.0.1/tcp/52474/p2p/12D3KooWDv4za5JHxCQRYaJADFncW9qZpRv1L3QtkKAJEnLge1Hi 12D3KooWSdZj972QcysLSHFhFZduFMzz9bPRKpPKDYTiJ1HWEKhZ Connected to peer at address: /ip4/127.0.0.1/tcp/52474/p2p/12D3KooWDv4za5JHxCQRYaJADFncW9qZpRv1L3QtkKAJEnLge1Hi

Note - Use the `--rpc-port` and `--port` options to choose the ports the Hub listens on.

###  How to run the Benchmark client
The bench client is very simple. It takes in the address and port of a Hub's RPC server and then generates registry information and signer events for `U` users (default is 100) and finally sends 1 cast per user to the configured Hub.

$ yarn bench -A "" -R 52475 -U 1000 yarn run v1.22.19 Using RPC server: /52475 Generating IDRegistry events for 1000 users. Time 15.899s : Generated 1000 users. UserInfo has 1000 items Time 1.479s : IDRegistry Events submitted 1000 events submitted successfully. 0 events failed. Waiting a few seconds for the network to synchronize Time 4.326s : Signers submitted 1000 signers submitted successfully. 0 signers failed. Waiting a few seconds for the network to synchronize Generating Casts for 1000 users Time 0.809s : Generated 1 cast for each user Time 5.522s : Casts submitted 1000 Casts submitted successfully. 0 Casts failed.

Time Task

15.899s Account generation time 1.479s IDRegistry Events 4.326s Signer Add Messages 0.809s Cast Messages Total: 22.514s ✨ Done in 51.14s.

varunsrin commented 2 years ago

Great writeup, thanks! Tagging in @sds for a second pair of 👀

A few comments/suggestions for this spec:

What do we want to do to track errors? We probably want some or all of logging framework that logs structured data like pino, visualizer + search tool for logs like aws cloudwatch, exception tracking software like sentry
Our goal is to be able to run at least 10 Hubs, so I think our tests should shoot to have at least that many in sync to identify issues that come up at that scale.
One cast per user may catch simple issues with sync but we should at least 2-5 messages of each other type.
Where should the AWS deployment and config be stored? I don't love the idea of putting AWS specific stuff in the open source repo since it won't be relevant for most users of the lib. (cc @sds who may have thoughts)
We should test taking down and restarting Hubs and see if they heal correctly.

One longer term thought and cc'ing @pfletcherhill for this - we probably want a "Chaos Monkey" version of the benchmark tool that can constantly generate and send messages to Hubs. Think of it as an app level fuzzer that can randomly generate contextually relevant messages and send it to Hubs. We should start with developing this locally but expand to doing it with the P2P tests before we launch to make sure there are no "gotchas" in the Hub.

sagar-a16z commented 2 years ago

A few comments/suggestions for this spec:

What do we want to do to track errors? We probably want some or all of logging framework that logs structured data like pino, visualizer + search tool for logs like aws cloudwatch, exception tracking software like sentry

Agreed. I don't have a preference here. Anything that can track metrics and logs + cloudwatch to restart or notify when something goes down.

Our goal is to be able to run at least 10 Hubs, so I think our tests should shoot to have at least that many in sync to identify issues that come up at that scale.

We can update this test to be run against 10 Hubs. That should be fine. I think right now, as long as Hubs are always up and running, we should be okay. We need Diff Sync to really get to test a bunch more scenarios with restarts and parititions.

One cast per user may catch simple issues with sync but we should at least 2-5 messages of each other type.

This is very easy to add to the bench client. Let me know when I should focus on this.

We should test taking down and restarting Hubs and see if they heal correctly.

I think this might work but I'd be much more confident in testing this once we have Diff Sync

sds commented 2 years ago

For this proposal, were we planning to automate the testing/verification process end to end, or were we seeking an environment that allows an individual to manually make requests to each hub and play around?

If the former, what if we just stood up the hub(s) using Docker Compose and running the automated integration tests against them in the context of a GitHub Actions runner? This wouldn't be our solution forever, just something to unblock this initiative for now.

If the latter, there's a deeper discussion to be had so we are clear on how this environment should be accessed and who (or under what conditions) it can be deployed to.

sanjayprabhu commented 2 years ago

+1 to what @sds said about docker compose. There are two separate issues here:

A network test with multiple hubs to make sure they can communicate and handle various scenarios
A deployment test to make sure hubs can run in the intended production environment (AWS) under realistic conditions

For 1., it should be as automated as possible. We should use docker compose and just make it a part of CI (could even be nightly if it takes too long).

Can be a one off manual test to make sure there are no AWS specific issues. I don't think we'd need to test in depth synchronization scenarios here, we could just log and compare the merkle root hashes on all the instances to make sure they are the same.

farcasterxyz / hub-monorepo