Add a minimal test of entire workflow

calliope-project / solar-and-wind-potentials

Estimation of solar and wind power generation potentials in Europe.

MIT License

10 stars 2 forks source link

Add a minimal test of entire workflow #11

Open timtroendle opened 3 years ago

timtroendle commented 3 years ago

As we are adding changes to this repo from time to time now, it would be good if there was a continuous integration test. For that, the workflow must be 100% automatic, and we should have a configuration that requires minimal downloads and minimal runtime. We can then use a simple GitHub action that runs Snakemake with this configuration (example).

timtroendle commented 3 years ago

Here's a list of things that need to be solved so that we have a 100% automatic, low data, low runtime workflow test:

[ ] automatic download of ESM data]
[ ] automatic download of EEZ data]
[ ] automatic download of renewable.ninja data]
[ ] download of NUTS with lower resolution]
[ ] download of LAU with lower resolution]
[ ] download of SRTM data based on scope configuration]
[ ] download of GMTD data based on scope configuration]
[ ] download of WDPA data based on scope configuration (is this possible?)]
[ ] configurable geospatial resolution]

brynpickering commented 3 years ago

Lovely idea, but I can't say I have any idea how it would be feasible... We could pre-package a bunch of datasets at lower res / smaller scope, but then we lose out on being able to test any of the workflow rules which act to access these datasets.

For WDPA, see #12 for a code snippet to automatically handle the constantly changing URL, but it looks like you can't choose to download only a section.

timtroendle commented 3 years ago

It seems to be possible to cache downloads of GitHub actions (up to 5GB for up to a week). This may help.

brynpickering commented 3 years ago

Or if we use Azure pipelines for CI, we get "unlimited" cache for up to a week: https://docs.microsoft.com/en-us/azure/devops/pipelines/release/caching?view=azure-devops

timtroendle commented 3 years ago

Then again, I just ran into an error running a euro-calliope workflow built based on v7 of hydro stations with a cached download of v4.

If we were to cache downloads, we'd need to make sure the cache is wiped whenever necessary. I don't see a trivial way of doing so right now.

timtroendle commented 3 years ago

In theory, we could also use our own test runners which would make caching easy. @suvayu, you mentioned using own runners in GitHub actions earlier. Do you have any idea how much work that would be? Also, do we have machines that we could use this way?

suvayu commented 3 years ago

@timtroendle I think it is some amount of work (but not an unreasonable amount given the flexibility and control that you gain). These are the hurdles I see:

we need install the runner application (doesn't look like requires admin privileges)
we need the runner to be accessible from the Internet (no VPN)
not clear about the environment setup, e.g. will workflows with the usual pip/conda actions continue to work unmodified (other than the small edit required to specify the runners)?

ETH IT can help with 1 & 2. For 1 doing it ourselves also doesn't seem difficult. I don't know if ETH security policy will come in the way of 2. 3, I have no idea, it seems it's either no work or quite a bit of work, no middle ground.

Docs

suvayu commented 3 years ago

BTW, If someone can deal with the "getting resources & permissions from ETH IT" part, I volunteer for the rest ;)

timtroendle commented 3 years ago

Thanks a lot @suvayu. I will see what I can do about 1. and 2. Can you clarify 3. a little more? What exactly could be the problem here?

suvayu commented 3 years ago

steps:
  - uses: actions/checkout@v2
  - name: Set up Python ${{ matrix.python-version }}
    uses: actions/setup-python@v2
    with:
      python-version: ${{ matrix.python-version }}

When a workflow has a step like this, that actions/setup-python runs, and I don't know:

how many of these actions would be supported by the runner application, I would guess while this one might be supported, but something like peaceiris/actions-gh-pages might not be. So probably there's some amount of work required to make sure they are separated into different jobs (most likely straightforward).
for pip it might be simple, but for conda or docker, it might require local resources or setup.

timtroendle commented 3 years ago

A short update: USYS IT is looking into this right now. If we are very lucky, we may be able to set this up ahead of the lost siblings sprint, which would be a major plus.