MarEichler / covid19_app

https://mareichler.shinyapps.io/covid19/
1 stars 0 forks source link

separate data pulling/processing from app #2

Open MarEichler opened 2 years ago

MarEichler commented 2 years ago

Currently app is pulling data no more than 1x a day, and then the data is saved so the next user doesn't have to pull and process the data in their session as well.

However, the data pull still takes a long time and so the 'first time' the app is loaded each day, it takes a long time (~30 seconds) to load.

Would like to look into to separating out the data pulling/processing from the app. Then the app can pull the already processed data, which should save time.

tomeichlersmith commented 2 years ago

I know we aren't planning on talking about this more anytime soon, but I was also thinking about it (and since you made the issue...)

I think there are a few options for integrating GitHub actions with deployment to shinyapps.io - all of which have their Pros and Cons. GitHub actions can me scheduled to to run on time intervals, when pushing commits to various branches, when opening PRs, when making a new release, and many other "events" (to use the GitHub lingo). These different "triggers" are very useful and are helpful in this circumstance. Below, I lay out the design I imagine would work and the possible "triggers" that may be helpful to use.

The shinyapps.io deployment instructions are already very formulaic which has led to some R users to already develop a draft GitHub action to deploy a shiny app.[^1] This is a nice workflow, but I'm imagining something a little simpler for this app. Changing the "triggers" is just changing the yaml file controlling the GitHub action.[^3] r-lib has developed some supporting GitHub actions[^4] that we will use and they actually have a shinyapps deployment action we can ruthlessly copy.

# this file: .github/workflows/test-and-deploy.yaml
name: test and deploy

# Controls when the action will run. 
on:
  # Triggers the workflow when you create a new release
  release:
    types: [published]
  # re-runs and re-deploys daily, the string says "5:30 UTC daily"
  schedule:
    - cron:  '30 5,17 * * *'
  # Allows you to run this workflow manually from the Actions tab
  workflow_dispatch:

# A workflow run is made up of one or more jobs that can run sequentially or in parallel
jobs:
  # This workflow contains a single job called "build"
  test:
    # The type of runner that the job will run on
    runs-on: ubuntu-20.04

    # Steps represent a sequence of tasks that will be executed as part of the job
    steps:
      # Checks-out your repository under $GITHUB_WORKSPACE, so your job can access it
      - uses: actions/checkout@v2
      - uses: r-lib/actions/setup-r@v2
      - name: Install Dependencies
        uses: r-lib/actions/setup-r-dependencies@v2
        with:
          cache-version: 2
          extra-packages: |
            any::ggplot2
            any::rcmdcheck
          needs: |
            website
            coverage
      - name: Do prep work like downloading data
        run: Rscript prep.R
      - name: Run R scripts to make sure app is working
        run: Rscript test.R

  shiny-deploy:
    needs: test # only deploy if tests pass
    runs-on: ubuntu-latest
    env:
      GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
    steps:
      - uses: actions/checkout@v2
      - uses: r-lib/actions/setup-pandoc@v2
      - uses: r-lib/actions/setup-r@v2
        with:
          use-public-rspm: true
      - uses: r-lib/actions/setup-renv@v2
      - name: Install rsconnect
        run: install.packages("rsconnect")
        shell: Rscript {0}
      - name: Do prep work like downloading data
        run: Rscript prep.R
      - name: Authorize and deploy app
        run: |
          rsconnect::setAccountInfo(${{ secrets.RSCONNECT_USER }}, ${{ secrets.RSCONNECT_TOKEN }}, ${{ secrets.RSCONNECT_SECRET }})
          rsconnect::deployApp()
        shell: Rscript {0}

[^1]: blog post and github repo [^2]: The biggest change I can forsee is the elimination of Docker. I don't know why the repo uses a docker image - it seems like overkill to me, but maybe it is a significant speed improvement? I am planning to play around and see. [^3]: Workflow syntax for GitHub actions and Events that trigger workflows is the documentation I look at a lot. [^4]: setup-r and setup-r-dependencies