OCHA-DAP / hdx-signals

HDX Signals
https://un-ocha-centre-for-humanitarian.gitbook.io/hdx-signals/
GNU General Public License v3.0
5 stars 0 forks source link

Implement GitHub action runs #70

Closed caldwellst closed 3 months ago

caldwellst commented 4 months ago

We need to move the GitHub Action runs for the new system. For the core runs, each indicator should have a single action, similar to the old global-monitoring.yaml file, except one for each script that runs update_{indicator}.R. My idea was to have a single script that does the setup of the system, like a core base workflow, and then a simple additional workflow after to run the script for each indicator.

And Hannah had this nice idea, which could also utilise the base workflow:

Personally, I'd advocate for setting things up so that scripts are never run locally! Even if it's basic one-time things like the first run, or updating the static datasets, it's so nice to have those up in an Action that's just triggered manually on button click! That way everything is super traceable and easy for anyone else to pick up without worrying about local environments, etc

So hopefully we can set all of these up soon! Critical for release is at least the base GitHub action runs.

hannahker commented 4 months ago

@caldwellst what do you think of:

Name Trigger Outputs Descr
update_assets.yml Manual All static assets in Azure (eg. name mapping, map data) Runs all update scripts in src-static/
first_run.yml Manual Archived signals in the root signals.parquet file Runs generate_signals.R with first_run=TRUE
monitor_acled.yml TBD Signals in the {indicator_id}/signals.parquet file, w/ associated content and draft campaigns in Mailchimp Runs update_{indicator}.R
monitor_idmc.yml TBD Signals in the {indicator_id}/signals.parquet file, w/ associated content and draft campaigns in Mailchimp Runs update_{indicator}.R
monitor_jrc.yml TBD Signals in the {indicator_id}/signals.parquet file, w/ associated content and draft campaigns in Mailchimp Runs update_{indicator}.R
monitor_ipc.yml TBD Signals in the {indicator_id}/signals.parquet file, w/ associated content and draft campaigns in Mailchimp Runs update_{indicator}.R
monitor_wfp.yml TBD Signals in the {indicator_id}/signals.parquet file, w/ associated content and draft campaigns in Mailchimp Runs update_{indicator}.R
check_signals.yml Daily Slack notifications for when new rows are added to each {indicator_id}/signals.parquet file Daily monitoring for new alerts that need to be triaged
caldwellst commented 4 months ago

So, a few thoughts:

first_run.yml

first_run.yml would be strange because generate_signals() is run for a specific indicator, like in your proposed monitor_{ind}.yml files. So an overarching script would be difficult. How about parameterising the monitor_{ind}.yml scripts so with the parameters:

This way if we wanted we could test the indicator runs using generate_signals(test = TRUE) through GitHub Actions as well as regenerating the data if and when it would be necessary, although ideally would not be ever necessary!

check_signals.yml

I think the issue here is we wouldn't want to send out an email without immediately updating the dataset. It would be better if the notification was sent as soon as we APPROVE in triage_signals(), so not actually an action of its own.

system_setup.yml

Last, should we have an action file like above to do system setup that all the others use so we know they all have the sufficient machine setup for whatever we need?

hannahker commented 4 months ago

first_run.yml

check_signals.yml

system_setup.yml

steps:
      - uses: actions/checkout@v3
      - name: Install required dependencies
      # Solution to unstable Azure server access is to update apt-get
      # Found on: https://github.com/actions/runner-images/issues/675
        run: |
          sudo apt-get update
          sudo apt-get install -y -f libcurl4-openssl-dev libharfbuzz-dev libfribidi-dev libudunits2-dev libgdal-dev
      - name: Set up R 4.2.3
        uses: r-lib/actions/setup-r@v2
        with:
          r-version: 4.2.3
      - uses: r-lib/actions/setup-pandoc@v2
      - name: Cache packages
        uses: actions/cache@v3
        with:
          path: ${{ env.RENV_PATHS_ROOT }}
          key: ${{ runner.os }}-renv-${{ hashFiles('**/renv.lock') }}
          restore-keys: |
            ${{ runner.os }}-renv-
      - name: Restore packages
        run: |
          if (!requireNamespace("renv", quietly = TRUE)) install.packages("renv")
          renv::restore()
        shell: Rscript {0}
caldwellst commented 4 months ago

lol sorry, I think brain capacity was reaching a minimum. What I meant on check_signals.yml was rather than checking once a day if signals updated or whatever, just attach this at the end of generate_signals(), so if we have generated any signals in the update_{ind}.R, run through the monitor_{ind}.yml action, a Slack notification is sent immediately. So we reduce the number of actions and we instantly know when we need to triage.

And yeah, system_setup.yml would be a better packaged version of that. Or just that, not sure how much we can simplify it, but let's see! But it looks like we could potentially use it by following this process to reuse workflows as the initial setup for all other scripts?

hannahker commented 4 months ago

Ok yes this makes sense!

hannahker commented 3 months ago

Closed with #105