Police-Data-Accessibility-Project / data-sources-app

An API and UI for using and maintaining the Data Sources database
MIT License
3 stars 5 forks source link

quicksearch not showing results #154

Closed josh-chamberlain closed 3 months ago

josh-chamberlain commented 9 months ago

Context

Sometimes, the app reaches a state where every search shows "no results" and we have to rebuild to fix it.

Screen Shot 2023-12-21 at 10 51 15 AM

Steps to reproduce

currently unknown. typically, waiting around a week.

Fixing the symptoms

we may not be able to fix this, but we should rebuild the app when it stops responding. search must work.

josh-chamberlain commented 4 months ago

This has recurred. Adjusting the issue to regularly check status and trigger a rebuild if it fails.

maxachis commented 4 months ago

I created a repository at https://github.com/Police-Data-Accessibility-Project/health-monitoring which I'll use to develop health monitoring scripts. It'll be fairly simple to start with, but its functionality is distinct enough that I think even now it's useful to keep it separate from other repositories. Over time, we can expand it with more in-depth health checks.

josh-chamberlain commented 4 months ago

@maxachis thanks, that's a good idea. I added a README and LICENSE.

maxachis commented 4 months ago

@josh-chamberlain I'll need the WEBHOOK_URL that is used to post dev alerts to discord in order to finish setting up a basic alerting system.

Current plan is to be conservative and have it call the search endpoint every hour. After that, I'll look into using the Digital Ocean API to trigger rebuilds on failure, but first step is to make sure it works properly and doesn't count false positives, which unfortunately means we may have to wait a little while until it fails again.

josh-chamberlain commented 4 months ago

@maxachis sounds like a good plan. DMing it to you, and adding it as an org-level secret...one day we could rename it, probably. i'll work on wrangling our secrets a bit better so we know what they're for.

maxachis commented 4 months ago

@josh-chamberlain I've finished up the first draft of the health-monitoring repository and set it up in the "Automation-Manager" Droplet (formerly "Database-Automation-Manager", but renamed since this part doesn't touch the database).

As designed, the manager will log errors to discord, and log all events to a rotating log in the root directory of the repository, which rotates logs every day at midnight. At the moment, this is designed just to allow us to confirm its immediate performance of intended logic, but it could be expanded (and the rate of log rotation modified) in the future.

Now it's going to be a waiting game until the search fails. If everything works properly, it'll post to discord when it occurs.

Aside

We may want to add documentation for the Automation-Manager droplet, discussing what it does and what repositories it hosts.

josh-chamberlain commented 3 months ago

@maxachis ah, thank you! That's great. A rotating log is great. Could we do it every week or 3 days, instead, just so they can be inspected even after a weekend or whatever?

Yes, documentation for the automation-manager droplet is critical because it's difficult for me (and other less-technical or less-paying-attention people) to tell how things are deployed. I just asked on a separate issue if we were ready to make a github action for this.

  1. in the README of this repo, we should say where it's deployed / how often it is triggered
    • if i make and merge changes, will they auto-deploy or do i need to do something?
    • how can i check to verify that this is still running / see the last time it ran?
  2. that might be it. long-term we should have some kind of statuspage but I think we're good for now

out of curiosity: what makes the droplet better than periodically triggering a github action? I like the Action because it's pretty transparent/happens next to the code, but I'm sure you have more control this way.

maxachis commented 3 months ago

out of curiosity: what makes the droplet better than periodically triggering a github action? I like the Action because it's pretty transparent/happens next to the code, but I'm sure you have more control this way.

@josh-chamberlain Control is a major component of it, but also ease of development. Using GitHub Actions for more complex operations, such as prod-to-dev migration or health monitoring has, in my experience, been challenging due to several interrelated factors:

  1. GA involves building up and tearing down environments. As repositories increase in size or scope, this adds additional operation time for what is effectively a redundant operation, compared to DO, where the environment simply remains in a static position for an extended period. That can impose a cost issue, especially if we're running lengthy operations multiple times a day. Especially for smaller actions, the time to setup and install all dependencies can take considerably longer than simply executing the operations all of that setup is made for.
  2. It is generally more difficult to debug in GA. There's often a delay in triggering a GA, and even if it's half a minute, that adds up quickly if I'm checking on something iteratively -- even longer if I have to wait for a setup to complete. I also can't step-by-step debug a GA as I can with other components. In DO, it's much easier to debug, run, and verify, because I can do it all from an Ubuntu command line.
  3. I have better insight into the environment in DO compared to GA. While GA does use an ubuntu environment, I can't access that environment via a shell and poke around to see what exists, what doesn't, and how commands are interpreted.

That being said, there are options that allow us to blend the two approaches:

maxachis commented 3 months ago

@josh-chamberlain Additionally, I created an issue for adding documentation about the Automation Manager.

josh-chamberlain commented 3 months ago

thanks @maxachis , this is helpful. I'll save it in our ADRs so we know when/why to abandon a github action prototype for a deployed thing.

josh-chamberlain commented 3 months ago

(here's the ADR I made retroactively, visible if you have notion perms)

maxachis commented 3 months ago

(here's the ADR I made retroactively, visible if you have notion perms)

I will say that Github Action is the better option for things that interface directly with the same repository it's located in and where we want to inspect changes immediately after or during pull requests. Tests, linting, security checks -- these all make sense to continue to include as Github Actions, both because:

  1. Libraries for these actions (such as pytest or bandit) already tend to support Github Actions out of the box, with minimal configuration
  2. And because it enables easy review at the point of pull requests, allowing us to quickly diagnose problems before they're merged. By contrast, something like prod-to-dev migration or health monitoring don't provide feedback immediately relevant to the associated repositories.

Additionally, we may benefit from synchronizing all our different workflows through something like Jenkins, which would help formalize the more complex CI/CD processes and integrate them under a singular user interface.

maxachis commented 3 months ago

@maxachis ah, thank you! That's great. A rotating log is great. Could we do it every week or 3 days, instead, just so they can be inspected even after a weekend or whatever?

  1. in the README of this repo, we should say where it's deployed / how often it is triggered

    • if i make and merge changes, will they auto-deploy or do i need to do something?
    • how can i check to verify that this is still running / see the last time it ran?
  2. that might be it. long-term we should have some kind of statuspage but I think we're good for now

@josh-chamberlain Repository updated to have log rotate every week, and README updated to include the requested information! Have a look at the Readme and let me know if it looks good to you.

josh-chamberlain commented 3 months ago

@maxachis nice! thank you. I'm good to close this as can't repro and consider it closed while we let health-monitoring do its thing.