Track provenance information

huard commented 4 years ago

There is a push in the IPCC to track provenance information for the figures and tables it publishes in its assessment reports. In WGI, ESMValTool and Climate4R have implemented provenance tracking mechanisms around the PROV standard (Climate4R has defined an extension to the stock ontology called metaclip, see metaclip.org for an example). I was wondering if pyam developers had considered similar mechanisms?

danielhuppmann commented 4 years ago

Thanks @huard for raising this important topic!

`pyam` used for recent IPCC reports

For completeness, let me first point to how we handled this in recent work related to IPCC reports.

For the IPCC SR15, the open-source Jupyter notebooks (using pyam) to create a number of figures, tables and statements in the report are available in rendered format and on GitHub. (doi: 10.22022/SR15/08-2018.15428)

The data used in these notebooks is available via the IAMC 1.5°C Scenario Explorer, with references to the underlying manuscripts on the About page and via individual data panels. (doi: 10.22022/SR15/08-2018.15429)

For the recently published IPCC SRCCL, @kvcalin used pyam for some of her scenario assessment with an updated version of the same data. She may still publish a suite of notebooks for reproducibility and transparency (time permitting, I guess).

Comment: If such notebooks could be published in parallel and cross-referenced in future reports, that would be a (admittedly low-key, unsophisticated, unstructured) way to increase transparency and show provenance information.

The data model in `pyam`

A pyam.IamDataFrame consists of a data table containing timeseries information and a meta table containing qualitative and quantitative indicators. One could see the "source" of a scenario as a meta-indicator, using df.set_meta() to mark a certain set of scenarios as originating from a particular manuscript or project. See cells 61-64 in this notebook for a rather clumsy way of using this feature to map scenarios to source manuscripts for the SR15 assessment.

Going forward

Adding an RDF to pyam for tracking entities, agents and activities on an IamDataFrame and embedding it in resulting figures would be feasible in principle (using the Python package prov, for example), but I don't see the resources to implement this in a realistic timeframe. Publishing notebooks together with a graph/figure or table seems, to me, like the more realistic way forward for the time being.

huard commented 4 years ago

Thanks @danielhuppmann for the detailed answer and your view on what is realistically achievable.

gidden commented 4 years ago

Hi all - FWIW, we employ semantic versioning, continuous integration, and deploy installable packages via pip and conda. As this is a two to three person operation at the moment without any funding mechanism, I do not suspect we will go further without additional resources.

On Mon, Oct 7, 2019 at 4:30 PM David Huard notifications@github.com wrote:

Thanks @danielhuppmann https://github.com/danielhuppmann for the detailed answer and your view on what is realistically achievable.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/IAMconsortium/pyam/issues/272?email_source=notifications&email_token=AAKUAELYBNSSAWSI3I7UNFTQNNBZFA5CNFSM4I5PIXA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAQRTAI#issuecomment-539040129, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKUAEOVTM537CGJRYID2GDQNNBZFANCNFSM4I5PIXAQ .

IAMconsortium / pyam