JerBouma / FinanceToolkit

Transparent and Efficient Financial Analysis
https://www.jeroenbouma.com/projects/financetoolkit
MIT License
2.96k stars 355 forks source link

[IMPROVE] Calculate bulk financial ratios without using toolkit #41

Closed sword134 closed 1 year ago

sword134 commented 1 year ago

I've created a stock universe consisting of 1000 different stocks from which I in a for loop fetch the income statement, balance sheet and cash flow using Yahoo finance. However currently there doesnt seem to be functionality to use the collect_all_ratios() function on such a dataset. Each loop generates 3 datasets for income statement, balance sheet and cash flow for each ticker in the universe. The idea is then to apply bulk ratio calculations and save those files.

I've tried adjusting the input parameters for the Toolkit() class as follows:

companies = Toolkit(tickers=ticker, balance=balance, income=income, cash=cash_flow, format_location=dir_path + "\\normalization_files")

However this yields an IndexError: Too many levels: Index has only 1 level, not 2

So I try to mitigate this error using:

balance.index = pd.MultiIndex.from_product([[ticker], balance.index])
    income.index = pd.MultiIndex.from_product([[ticker], income.index])
    cash_flow.index = pd.MultiIndex.from_product([[ticker], cash_flow.index])

However that then throws a ValueError: No axis named columns for object type Series

What can I do?

JerBouma commented 1 year ago

Could you provide me with a sample of your dataset? Then I can have a look. Also, did the example about using external datasets not work out for you? There is a Notebook about this.

The first error seems to relate exactly to the issue that is solved in the Notebook. You need to have a Multi Index defined like I do in the example.

sword134 commented 1 year ago

Could you provide me with a sample of your dataset? Then I can have a look. Also, did the example about using external datasets not work out for you? There is a Notebook about this.

Sure, i've uploaded them to a little repo: https://github.com/sword134/debug_financialtoolkit

EDIT: I've also created a folder for how the data looks like after I multi index it

JerBouma commented 1 year ago

Can you also include the normalisation files? Also can you have a look at this example if it doesn't already resolve your issue: https://github.com/JerBouma/FinanceToolkit#working-with-other-datasets

sword134 commented 1 year ago

Can you also include the normalisation files? Also can you have a look at this example if it doesn't already resolve your issue: https://github.com/JerBouma/FinanceToolkit#working-with-other-datasets

Uploaded to the repo. However the normalization files are just the default ones. It shouldnt (?) have an impact on these errors since an except handles whether or not the column with the correct name exists or not.

I've looked at that documentation quite intensively and tried debugging through changing the source code as well. The problem here is that I am using a for loop to loop through around 1000 tickers and I therefore cannot generate all the necessary variables to feed the Toolkit() class

JerBouma commented 1 year ago

Got it, I'll look later today or tomorrow at the issue. I understand your use-case and might expand the Toolkit with this functionality accordingly. Will keep you posted.

sword134 commented 1 year ago

Got it, I'll look later today or tomorrow at the issue. I understand your use-case and might expand the Toolkit with this functionality accordingly. Will keep you posted.

Thank you!

sword134 commented 1 year ago

Looking a bit further into it outside of just the "AIR" ticker, it seems that Yahoo finance can have several different balance activities differing on the company, afterall its their filing and if they want to seperate two balance sheet items they can as long as they are within regulation. This gives quite a large amount of encodings that would need to be done in the normalization. Perhaps one could use an LLM like chatgpt in python to make it create the normalization based on all the different balance entries? I figure the same problem is present in the income statement and cash flow.

I will try to see if the above mentioned method will yield any results or just gibberish tomorrow or the day after

JerBouma commented 1 year ago

Ok so the error you are receiving is mostly due to that there is so many incorrect names in the normalisation files that it doesn't work. E.g. if I do the following:

from financetoolkit import Toolkit
from financetoolkit.base import helpers

balance = pd.read_excel('AIR_balance.xlsx', index_col=0)
cash = pd.read_excel('AIR_cash.xlsx', index_col=0)
income = pd.read_excel('AIR_income.xlsx', index_col=0)

balance_grouped = helpers.combine_dataframes(['A', 'B', 'C'], balance, balance, balance)
cash_grouped = helpers.combine_dataframes(['A', 'B', 'C'], cash, cash, cash)
income_grouped = helpers.combine_dataframes(['A', 'B', 'C'], income, income, income)

toolkit = Toolkit(['A', 'B', 'C'], balance=balance_grouped, cash=cash_grouped, income=income_grouped, reverse_dates=True)

toolkit.ratios.collect_solvency_ratios()

It does return some data, but because so much is missing (which is the case for the others), it doesn't work for the others. The all ratios relies on being able to get at least one ratio for each category which it fails to do so hence the error. This can be more robust for sure but all in all, it really just requires updating of the normalisation file.

image

E.g. toolkit.ratios.get_debt_to_assets_ratio() works fine.

To account for all symbols you can do the following:

balance_grouped = pd.concat([balance] * 100).set_index(pd.MultiIndex.from_product([[str(x) for x in range(1, 101)], balance.index]))
cash_grouped = pd.concat([cash] * 100).set_index(pd.MultiIndex.from_product([[str(x) for x in range(1, 101)], cash.index]))
income_grouped = pd.concat([income] * 100).set_index(pd.MultiIndex.from_product([[str(x) for x in range(1, 101)], income.index]))

toolkit = Toolkit([str(x) for x in range(1, 101)], balance=balance_grouped, cash=cash_grouped, income=income_grouped, historical=pd.DataFrame(), reverse_dates=True)

toolkit.get_balance_sheet_statement()

Which returns

image

It's not pretty but it does work. Need to work on some proper error handling to account for missing tickers and values. When it comes to filings being different per company, I am afraid that's a step too far for me to handle as well. Normalisation is pretty important to begin with when it comes to comparison analysis.

JerBouma commented 1 year ago

Please update to v1.1.0 in which this is now fixed. You can now use the helper function to feed in a dictionary instead. Should you make it doable to loop over your data and create the structure as I show.

Screenshot 2023-07-20 at 10 17 16