gitcoinco / passport

Passport allows users to prove their identity through a secure, decentralized UI
Other
936 stars 449 forks source link

Monitoring of Public and Private APIs #2402

Open nutrina opened 2 months ago

nutrina commented 2 months ago

User Story:

As a developer, I want to achieve 100% monitoring coverage for all our endpoints (private APIs, public APIs, and HTML pages), so that I can have high confidence that our platform is up and running at all times.

This task only refers to Uptime Robot monitoring & alarms. More details about what is actually envisioned with this requirement this notion doc: https://www.notion.so/gitcoin/Passport-Monitors-PD-Alarms-444bfbe603d146ecbdd54211e1646957?pvs=4#40b94af8a1264dd7a6ac9e99615942ae

Acceptance Criteria

GIVEN we are releasing a new API endpoint WHEN the release process is run AND an Uptime Robot monitor has not been set up for that endpoint THEN I want the workflow to fail

GIVEN we are releasing a new API endpoint WHEN I run the script create_uptime_monitors.ts THEN I want all missing uptime monitors to be created

Documentation and Monitoring Overview: As part of managing this feature, it's crucial to maintain a current and comprehensive record of all monitoring configurations and their statuses. For each task or update, the Notion page on Passport Monitors & PD Alarms must be updated to reflect the latest state and provide an overview of the monitoring topic. This will ensure transparency and continuity in monitoring practices.

Product & Design Links:

#### Tech Details: - **Monitoring Tool:** Utilize Uptime Robot for monitoring. Consider establishing a dedicated account if scalability or management issues arise with the current setup. - **Automation:** - Automate the creation of monitors using the Uptime Robot API. - Explore the possibility of creating a Pulumi provider to manage Uptime Robot configuration as code. See potential approach for translating similar Terraform configuration to Pulumi: [Pulumi and UptimeRobot Integration](https://www.pulumi.com/ai/answers/3JDnGAzv7mC5nULqtQkcYY/translating-terraform-to-pulumi-uptimerobot-module). - Automatically validate the monitors by comparing the list of active monitors against the endpoint list output by the Django application. Implement a mechanism to regularly check and ensure all endpoints are covered without manual intervention. - it is possible to limit the number of notification PD sends by configuring the `alert_contacts`, see the documentation for `alertcontact>threshold` and `alertcontact>recurrence` here https://uptimerobot.com/api/#parameters #### Open Questions: #### Notes/Assumptions: - Assume current API traffic and performance patterns will remain consistent unless noted by recent changes. - Endpoint definitions and statuses are dynamically documented and accessible for integration with monitoring tools.
lucianHymer commented 1 month ago

Need some details on Uptime robot before finishing, will discuss with engineers today.

Edit: Blocked until get paid UR account

lucianHymer commented 1 month ago

Command to get unmonitored URLs (can be found in workflow files)

python manage.py show_urls -f json > urls.json &&
python manage.py get_unmonitored_urls --urls urls.json --base-url https://api.scorer.gitcoin.co --out unmonitored.json --allow-paused True

Must populate the READONLY API Key in the passport-scorer/api/.env (This should be added to Github ENV)

Then on the other side, the command has been added to passport-scorer/infra/README

npx tsx scripts/uptime_robot/create_monitor.ts --help

Must populate the Read/Write API Key in the passport-scorer/infra/.env

This will show you the different types of monitors you can create. Run a command like

npx tsx scripts/uptime_robot/create_monitor.ts simple_get https://api.scorer.gitcoin.co/ cgrants/docs  /passport/docs registry/docs

To create a long-lived (100 years) ceramic cache tokens, run e.g.

python manage.py generate_ceramiccache_access_token --address 0x4a08ae41F821BA18fc0263D8f278bfb017512023
lucianHymer commented 1 month ago

I've created all the simple GET alarms in the new Passport account.

nutrina commented 1 month ago

Have merged the current state into main. The current approach however should be revised before continuing.

According to the 2nd acceptance criteria we need a single script that creates all alarms, preferably without any configuration parameters (except the access key).

Similar to the pulumi scripts, where we create AWS alarms, in the create_uptime_monitors.ts we should programatically create the uptime robot alarms.