Clarify the input data for the plots without financial data

maurolepore commented 10 months ago

Clarify the input data for the plots, especially with regard to this issue here. -- @AnneSchoenauer

Note for my future self:

I opened this issue on behalf of Anne, as I turned her roadmap from a list of TODOs in a google document into a collection of issues in the "Insights" milestone.

AnneSchoenauer commented 10 months ago

Dear @maurolepore,

Context: there are three kinds of plots. Please refer to the Google Document here again. For each of the three kinds, we need a different dataset.

Please note that I only refer now to the product level as the input product level still needs some enhancement. So best to focus now only on the emission profile (former PCTR) and sector profile (former PSTR).

1. Database insights which are divided into: 1.1 Descriptives of database without indicators

This would be the pre-proccessed web-scraped data (so the nlp step is done), plus all sector mapping done plus some information that comes from the raw webscaped data from ep. In my understanding, this data would be the data that is the output of the tilt indicator before part. However, to be 100% sure it would be great if you could pinpoint me to the data that you put in the "normal" tiltIndicator package.

You can find here example data of how the data would look like. Please note that

in green you can see the columns that should come from the TiltIndicatorBefore package.
in yellow you can see the columns that are included in the raw data - the one that you webscraped from ep.
In red you see the columns that I wanted to have included (see this issue here).

1.2 Descriptives of database with indicators

This would be the data that comes out of the tiltIndicatorAfter package. You can find here some sample data. Again

in green you can see the columns that should already be included.
In red you see the columns that I wanted to have included (see this issue here).

This dataset is also the data that Tilman gave me for the Bundesbank. You can find this data here.

maurolepore commented 10 months ago

I polished the title and removed a label to more accurately reflect that this issue refers to all plots without financial data -- including with and without indicators.

maurolepore commented 10 months ago

@AnneSchoenauer

... it would be great if you could pinpoint me to the data that you put in the "normal" tiltIndicator package.

The example at the tiltIndicator website shows library(tiltToyData). If you click on "tiltToyData" you'll land at its website. That website shows a bit of every dataset, and also shows how to read any specific dataset. For example, you can read with the dataset "emissions_profile_products" with read_csv(toy_emissions_profile_products()).

You can find here example data of how the data would look like.

Which dataset are you pointing me to? Did you mean to add a link behind "here"?

This dataset is also the data that Tilman gave me for the Bundesbank. You can find this data here.

The link behind "here" points me to a google drive with multiple files and folders. Which one do you want me to see?

maurolepore commented 10 months ago

@AnneSchoenauer

Once we discuss the items in my comment above, I would consider this issue done and would close it. This would indicate that you have provided the clarifications you think necessary to re-gain this context whenever we need this information to develop specific plots.

AnneSchoenauer commented 10 months ago

@maurolepore

This is fine for me! I think it becomes important for Linda and for you when you work on the plots? But happy if we close it and bring it up once it is important again!

maurolepore commented 10 months ago

OK. But before I close it note that I have two questions above.

AnneSchoenauer commented 9 months ago

Hi @maurolepore ,

Please find my answers below:

The example at the tiltIndicator website shows library(tiltToyData). If you click on "tiltToyData" you'll land at its website. That website shows a bit of every dataset, and also shows how to read any specific dataset. For example, you can read with the dataset "emissions_profile_products" with read_csv(toy_emissions_profile_products()).

Thanks a lot for this. This makes a lot of sense and I understand better. I also pinpointed Bob to it again. I thought that Bob, Tilman and Kalash reviewed it but I make also sure to run the whole package next week to see if I find any mistakes in the methodology, data etc.

Which dataset are you pointing me to? Did you mean to add a link behind "here"?

Embarassing! The whole ticket was about creating some sample data (I was in the train when I wrote the ticket so most likely it didn't upload or I forgot about it). Anyhow, I now linked the sample data above.

The link behind "here" points me to a google drive with multiple files and folders. Which one do you want me to see?

Yes that is right as all of the data are the same data just for different jurisdiction. I would suggest that you would look at the folder ALL_countries, which is here. You then can find two datasets per indicator - one on the company level and one on the product level. I always used the product level when I created the plots. So I would suggest that you start looking at the pctr_product_level.zip. This is the data for the emission profile.

Please note for context - the files that you can see in the folder here are the data that we shared with the banks. The reason why we didn't share with them the tiltIndicator Package yet is that you need as input data raw data from ecoinvent. However, the banks don't have licenses with ecoinvent yet. Therefore, we calculated the data for them and send them the output file after the tiltIndicator package. For the future we however would like to have a process in which all banks have their own license and can calculate the indicators themselves. Hope this helps for an understanding.

maurolepore commented 9 months ago

Thanks @AnneSchoenauer and cc' @lindadelacombaz

For emissions profile at product level, it's now clear that the output of tiltIndicator is already connected to the input of tiltPlot. \

Here is a reprex showing the connection from end-to-end (after some adaptor code flagged with FIXME).

library(tiltToyData)
library(tiltIndicator)
library(tiltPlot)
library(dplyr, warn.conflicts = FALSE)
library(readr, warn.conflicts = FALSE)

options(readr.show_col_types = FALSE)

companies <- read_csv(toy_emissions_profile_any_companies())
products <- read_csv(toy_emissions_profile_products())

emissions_profile <- emissions_profile(companies, products)
at_product_level <- unnest_product(emissions_profile)

# FIXME: Adapt the ouput of tiltIindicator to the input of tiltPlot
all_companies <- at_product_level |> 
  rename(xctr_risk_category = risk_category, benchmark = grouped_by)

plot_xctr(all_companies)

This also shows that tiltPlot does not need new toy datasets -- the toy datasets in tiltToyData are enough.

At this early stage of development I think the "beautification" of column names is a bad idea and instead we should prefer consistency. The tiltIndicator package outputs ugly names but they are consistent across all indicators. If we stick to those ugly names, we can feed the output of tiltIndicator directly into tiltPlot. In any case if we build an interactive tool on top of our packages, the ugly names will remain hidden in the back-end and the users would only see more polished output in the front-end. If we go that way we may not need tiltIndicatorAfter -- at least not for now.

maurolepore commented 9 months ago

As this is now clear to me, I'll go ahead and close this issue. Thanks @AnneSchoenauer.

2DegreesInvesting / tiltPlot

Clarify the input data for the plots without financial data #45