Open brynpickering opened 4 years ago
I thought about it when starting to build euro-calliope and decided against it. The main reason were the difficult data dependencies, especially those involving Swiss data.
However, through time, the ties from euro-calliope to this repo became stronger (for example your first reason did not exist initially), and so it's a good idea to reiterate the decision.
I think we could combine the benefits of tighter integration and avoid the downsides from the data dependencies by better modularisation. The most straight-forward thing to do would be to kick the Swiss data analysis out of this repo and add a structural break (via Zenodo) from this repo to the Swiss data. In addition, we could break this repo once more: into the assessment of potentials, and into the stuff that is related to the "Home-made" paper. The latter isn't critical, but would make the potentials repo much cleaner.
I would be a fan of breaking this repo from the paper, making it a repo that can act as a dependency of the paper repo and the euro-calliope paper. That way you can fix the data submodule commit in the paper repo and all that will still work. I expect that changes to the data generation workflow will make the paper either no longer make sense or just not work, and it doesn't make sense to keep it updated alongside everything else.
Still, one downside remains: building euro-calliope will require many more GBs of data and it will take a few (on cluster) or many (on laptop) hours longer.
Can we design it in a way that this is optional? Then, you could use either the Zenodo data, OR this workflow?
yeah, in the same way as you ask users to download certain datasets before starting, you could include this one as optional (providing a drop-in for raw-potentials.zip). If the user doesn't provide it, then it triggers the submodule to generate the zip? Ideally, the overlapping datasets (EEZ, ESM, capacity factors) could be easily symlinked if the submodule is triggered.
Or specific, tagged, euro-calliope commits point to packaged Zenodo data?
Instead of packaging up the output of this repository and then pulling that information from Zenodo when running the Euro-Calliope workflow, I think it would be better to have this is a submodule of Euro-Calliope, similar to how Euro-Calliope is a submodule of the OSE model workflow. Some reasons for this:
a. They share a lot of the same input data, so there isn't much overhead in terms of preparing the datasets. In fact, downloading and generating shapefiles (one of the more time intensive tasks) only needs to be done once. b. If a user wants to change something in Euro-Calliope that leads to needing different technical potential data, they have to wait for this to be re-packaged on Zenodo. This includes changing the spatial scope (see timtroendle/solar-and-wind-potentials#1) and having a different spatial resolution (e.g. NUTS3). c. You could just re-generate the technical eligibility data for the resolutions of interest for your energy system model (and ignore the report generation), so time/memory penalty would be low.