Implement GitHub action runs

caldwellst commented 4 months ago

We need to move the GitHub Action runs for the new system. For the core runs, each indicator should have a single action, similar to the old global-monitoring.yaml file, except one for each script that runs update_{indicator}.R. My idea was to have a single script that does the setup of the system, like a core base workflow, and then a simple additional workflow after to run the script for each indicator.

And Hannah had this nice idea, which could also utilise the base workflow:

Personally, I'd advocate for setting things up so that scripts are never run locally! Even if it's basic one-time things like the first run, or updating the static datasets, it's so nice to have those up in an Action that's just triggered manually on button click! That way everything is super traceable and easy for anyone else to pick up without worrying about local environments, etc

So hopefully we can set all of these up soon! Critical for release is at least the base GitHub action runs.

hannahker commented 4 months ago

@caldwellst what do you think of:

Name	Trigger	Outputs	Descr
`update_assets.yml`	Manual	All static assets in Azure (eg. name mapping, map data)	Runs all update scripts in `src-static/`
`first_run.yml`	Manual	Archived signals in the root signals.parquet file	Runs `generate_signals.R` with `first_run=TRUE`
`monitor_acled.yml`	TBD	Signals in the `{indicator_id}/signals.parquet` file, w/ associated content and draft campaigns in Mailchimp	Runs `update_{indicator}.R`
`monitor_idmc.yml`	TBD	Signals in the `{indicator_id}/signals.parquet` file, w/ associated content and draft campaigns in Mailchimp	Runs `update_{indicator}.R`
`monitor_jrc.yml`	TBD	Signals in the `{indicator_id}/signals.parquet` file, w/ associated content and draft campaigns in Mailchimp	Runs `update_{indicator}.R`
`monitor_ipc.yml`	TBD	Signals in the `{indicator_id}/signals.parquet` file, w/ associated content and draft campaigns in Mailchimp	Runs `update_{indicator}.R`
`monitor_wfp.yml`	TBD	Signals in the `{indicator_id}/signals.parquet` file, w/ associated content and draft campaigns in Mailchimp	Runs `update_{indicator}.R`
`check_signals.yml`	Daily	Slack notifications for when new rows are added to each `{indicator_id}/signals.parquet` file	Daily monitoring for new alerts that need to be triaged

caldwellst commented 4 months ago

So, a few thoughts:

`first_run.yml`

first_run.yml would be strange because generate_signals() is run for a specific indicator, like in your proposed monitor_{ind}.yml files. So an overarching script would be difficult. How about parameterising the monitor_{ind}.yml scripts so with the parameters:

first_run: set to default as FALSE when running the action manually, and also set to FALSE explicitly in the CRON job
test: set to default as TRUE when running the action manually, and set to FALSE explicitly in the CRON job

This way if we wanted we could test the indicator runs using generate_signals(test = TRUE) through GitHub Actions as well as regenerating the data if and when it would be necessary, although ideally would not be ever necessary!

`check_signals.yml`

I think the issue here is we wouldn't want to send out an email without immediately updating the dataset. It would be better if the notification was sent as soon as we APPROVE in triage_signals(), so not actually an action of its own.

`system_setup.yml`

Last, should we have an action file like above to do system setup that all the others use so we know they all have the sufficient machine setup for whatever we need?

hannahker commented 4 months ago

first_run.yml

Including this in the monitor_{ind}.yml makes sense to me!

check_signals.yml

Maybe there's two separate pieces here. I'm imagining this as a mechanism for letting maintainers know if they have to go through the triage_signals() process. Something like... "{X} alerts were detected for {indicator} on {date}. Please review draft campaigns and triage the signals to send alerts." Then someone goes and does the triage_signals() process and the alerts get sent out.

system_setup.yml

Are you thinking something like a better packaged version of this? Agreed it would be a bit annoying to copy this with each action. My understanding

steps:
      - uses: actions/checkout@v3
      - name: Install required dependencies
      # Solution to unstable Azure server access is to update apt-get
      # Found on: https://github.com/actions/runner-images/issues/675
        run: |
          sudo apt-get update
          sudo apt-get install -y -f libcurl4-openssl-dev libharfbuzz-dev libfribidi-dev libudunits2-dev libgdal-dev
      - name: Set up R 4.2.3
        uses: r-lib/actions/setup-r@v2
        with:
          r-version: 4.2.3
      - uses: r-lib/actions/setup-pandoc@v2
      - name: Cache packages
        uses: actions/cache@v3
        with:
          path: ${{ env.RENV_PATHS_ROOT }}
          key: ${{ runner.os }}-renv-${{ hashFiles('**/renv.lock') }}
          restore-keys: |
            ${{ runner.os }}-renv-
      - name: Restore packages
        run: |
          if (!requireNamespace("renv", quietly = TRUE)) install.packages("renv")
          renv::restore()
        shell: Rscript {0}

caldwellst commented 4 months ago

lol sorry, I think brain capacity was reaching a minimum. What I meant on check_signals.yml was rather than checking once a day if signals updated or whatever, just attach this at the end of generate_signals(), so if we have generated any signals in the update_{ind}.R, run through the monitor_{ind}.yml action, a Slack notification is sent immediately. So we reduce the number of actions and we instantly know when we need to triage.

And yeah, system_setup.yml would be a better packaged version of that. Or just that, not sure how much we can simplify it, but let's see! But it looks like we could potentially use it by following this process to reuse workflows as the initial setup for all other scripts?

hannahker commented 4 months ago

Ok yes this makes sense!

hannahker commented 3 months ago

Closed with #105

OCHA-DAP / hdx-signals