Open bergalli opened 9 months ago
Dear @bergalli , I am now currently trying to run the trisk function adapted to my values, but could you please explain the meaning of the variables net_profit_margin_rawdata_ratio, debt_equity_ratio and volatility_rawdata_ratio in the _rawdata_ratiotests/testthat/test_data/ST_INPUTS_DEV/prewrangled_financial_data_stress_test.csv file
Why are these variables categorical and in general is there a variables dictionary that I have possibly missed?
Maany thanks in advance for your answer!
Hi @Vladlenman, the variables you mention are remnants of the pre-processing applied on the financial data. They don't have any effect on the model, their usage is just to run data quality checks later on.
More details about the preprocessing: Due to an incomplete matching of company's financial indicators to company's productions, the preprocessing fills in values for missing companies by using averages over sector/country. Those variables then give the information of wether the value are inferred or not. This is where the columns are created (in another repo) : https://github.com/2DegreesInvesting/STDataMGMT/blob/7915cc2aa0df1e7b5b5daaf21b9daaf11803f5a6/R/prepare_prewrangled_financial_data_stress_test.R#L494
Also for information : this repository will be archived after the summer, and the work on Trisk will be pursued here instead: https://github.com/Theia-Finance-Labs/trisk.model
Best,
Ddear @bergalli ,
Thank you for your quick response. I greatly appreciate it, as it has helped me move forward. I apologize for inundating you with questions, but I am currently navigating through the repositories and trying to understand the connections between the files.
Currently, I am working with input files located in tests/testthat/test_data/ST_INPUTS_DEV. Specifically, I am placing my data into abcd_input_test and prewrangled_financial_data. Everything works well as long as I use the "Global" scenarios in the scenario_geography column of the abcd_file. However, when I attempt to change the geography, I encounter the following error:
-- Validating input arguments.
-- Reading input data from designated input path.
-- Processing input data.
Joining with by = join_by(scenario, ald_sector, ald_business_unit)
Joining with by = join_by(ald_sector, ald_business_unit)
Error in purrr::map()
:
ℹ In index: 1.
Caused by error in map2()
:
ℹ In index: 1.
ℹ With name: year.
Caused by error in stop_if_empty()
:
! Stopping calculation, dataset Production Data is empty.
I always verify whether the scenario is accessible for the specified region using the get_scenario_geography_x_ald_sector function. I have ensured that the combinations between geography and scenario exist in the file.
I am running trisk function using the following parameters: run_trisk( input_path = "", output_path = "", baseline_scenario = "WEO2021_APS", shock_scenario = "WEO2021_SDS" )
Additionally I have a small question about the pd calculation for tests/testthat/test_data/ST_INPUTS_DEV/prewrangled_financial_data_stress_test.csv The PDs there are usually close to 0.5. I understand that you get this data from Eikon, I do not have access to it, but do really this companies have probability of default equal 50% or is this value already in percentages, so 0.5 in the file actually means 0.5%?
I would appreciate any help with the following issues
Best regards,
Hi @Vladlenman ,
No worries it's always nice to see interest in the project :)
I think your issue stems from the fact that you haven't set the input and output paths as parameters, which is where the input data is stored, and an existing output directory where the results will be written.
I made this notebook for another project, which replicates the Trisk methodology end to end using the synthetic data you already saw. That should help you setup , let me know if you have issues accessing it.
https://colab.research.google.com/drive/1mVFSQxOVMoIE-t5GK2StJPkcSD0_OwqK?usp=sharing
Dear @bergalli,
many thanks for the shared collab file. It has helped me to repeat the whole process and got the desired plots. I have left only a few theoretical and data management questions about the process, but before it I would clarify some technical issues that I observed and would be very thankful about your answers.
-Unfortunately, I did not manage to run the function for any geographical scenario except for global, and even for the synthetic dataset presented in this git it works only for Global scenario. Did I get it right that currently it is only possible to run the function for the global scenario? -Currently in yor colab file for the read_portfolio_csv you make "asset type" as "fixed income" and ald_location as "unknown" by default. Does it have any alternatives (especially it is interesting for the "asset type", whether it is accessible for equity shares) or here it is still work in progress?
Other questions are more related to the variable description interpretation of the results: -From where exactly does the file company_trajectories have net_profits? Is it multiplication between the company production and prices from one of the input files? -What does fair_share_perc in the scenario_analysisinput stand for? -What does carbon_tax in ngfs_carbon_price dataset stand for? Particularly, why is it zero for all regions before 2025? -What are the sources from which you get information about carbon prices? Specifically, for datasets ngfs_carbon_price, scenario_analysisinput, price_data_long. I do not require specific datasets or access, just want to know the names of the sources.
And if you have some more energy, I would be interested in you some future development plans: -Do you plan to add more scenarios or approaches to the steel manufacturers or does it depend on the climate scenarios? -Did I get it right, that currently the function to test many portfolios at one time is not ready yet, but in the collab file there are already some developments -What is the process of creating the industry scenarios for the industries? Does it include a possibility to include to the code own scenarios that will possibly include new industries to enlarge the coverage of the possible companies to estimate?
Many thanks in advance for your help. I enjoy engaging with your project!
devtools::load_all()