JerBouma / FinanceToolkit

Transparent and Efficient Financial Analysis
https://www.jeroenbouma.com/projects/financetoolkit
MIT License
2.74k stars 335 forks source link

collect_all_ratios(growth=True, lag=[1, 2, 3, 4, 5], trailing=4) doesnt return the correct value nor named columns #107

Closed sword134 closed 7 months ago

sword134 commented 7 months ago

Applying collect_all_ratios(growth=True, lag=[1, 2, 3, 4, 5], trailing=4) on a symbols list consisting of symbols = ["MSFT", "AAPL"] yields the following table:

image

The first "Accounts Payable Turnover Ratio" is simply the result of a .collect_all_ratios() with no arguments. However the next 4 are generated via the collect_all_ratios(growth=True, lag=[1, 2, 3, 4, 5], trailing=4). First of all none of these rows have a suffix, having a _lag1, _lag2, ..., _lag5 suffix would be a great improvement in QoL for users who concat the dataframes (like me), that way we dont lose track of which of the Accounts Payable Turnover Ratio is the non-growth, the lag 1 and so on.

Furthermore, I cant seem to figure out how the lag works? To me none of the values seem lagged. Shouldnt the value in 2023Q3 for a lag1 be equal to the value in 2023Q2 and the value in 2023Q3 for a lag2 should be equal to 2023Q1?

JerBouma commented 7 months ago

Hi @sword134,

I don't understand what you mean, all of these contain labels. If you desire it to contain the suffix lag_1 you can simply combine the two levels of the Multi Indices together to create labels. I can not accommodate everyone's specific use-case especially if it is resolved by relatively small work.

all_ratios = companies.ratios.collect_all_ratios(growth=True, lag=[1, 2, 3, 4, 5], trailing=4)

all_ratios.index = [all_ratios.index.get_level_values(0), all_ratios.index.get_level_values(1) + '_' + all_ratios.index.get_level_values(2)]

Which returns:

image

Let's say you have:

How lag works is as follows (which equals Econometric variables T-1, T-2, T-3 etc):

Therefore these values will not be the same given that their current value (T) changes over time. Now when you add in trailing=4 you will first calculate the TTM values (given that you have quarter=True) and then proceed to calculate the growth of these TTM values.

sword134 commented 7 months ago

Hi @sword134,

I don't understand what you mean, all of these contain labels. If you desire it to contain the suffix lag_1 you can simply combine the two levels of the Multi Indices together to create labels. I can not accommodate everyone's specific use-case especially if it is resolved by relatively small work.

all_ratios = companies.ratios.collect_all_ratios(growth=True, lag=[1, 2, 3, 4, 5], trailing=4)

all_ratios.index = [all_ratios.index.get_level_values(0), all_ratios.index.get_level_values(1) + '_' + all_ratios.index.get_level_values(2)]

Which returns:

image

Let's say you have:

  • Q1 23
  • Q2 23
  • Q3 23
  • Q4 23
  • Q1 24
  • Q2 24

How lag works is as follows (which equals Econometric variables T-1, T-2, T-3 etc):

  • Lag 1 = Q2 / Q1 and Q3 / Q2 and Q4 / Q3 etc (regular growth)
  • Lag 2 = Q4 / Q2 and Q1 / Q3 and Q2 / Q4 (growth with 1 skip)
  • Lag 3 = Q1 / Q2 ad Q2 / Q3 (growth with 2 skips)

Therefore these values will not be the same given that their current value (T) changes over time. Now when you add in trailing=4 you will first calculate the TTM values (given that you have quarter=True) and then proceed to calculate the growth of these TTM values.

This works sometimes but for some reason when I run the following symbol list through this code it returns a "IndexError: Too many levels: Index has only 1 level, not 2" It also returns alot of exceptions related to "There is an index name missing in the provided financial statements. This is "['QADB'] not in index". This is required for the function (get_price_earnings_ratio) to run. Please fill this column to be able to calculate the ratios.".

I suspect that the package fails at handling QADB

symbols = ['ACIW', 'ACVA', 'ADBE', 'ADEA', 'ADSK', 'AGYS', 'AKAM', 'ALKT', 'ALMFF', 'ALRM', 'ALTR', 'AMPL', 'AMSWA', 'ANSS', 'APP', 'APPF', 'APPN', 'APPS', 'ASAN', 'AUR', 'AUROW', 'AVDX', 'AVLR', 'AVPT', 'AVYA', 'AVYAW', 'AYX', 'AZPN', 'BAND', 'BASE', 'BIGC', 'BILL', 'BKI', 'BKKT', 'BL', 'BLKB', 'BMBL', 'BNFT', 'BOX', 'BRZE', 'BSY', 'BTRS', 'BTRSW', 'CALX', 'CBU', 'CCCS', 'CCSI', 'CDAY', 'CDK', 'CDNS', 'CFLT', 'CLDR', 'CLSK', 'COIN', 'COLB', 'COMP', 'COUP', 'CRM', 'CRNC', 'CRWD', 'CSGS', 'CTXS', 'CVLT', 'CVT', 'DBD', 'DBX', 'DCT', 'DDOG', 'DH', 'DOCN', 'DOCU', 'DOMO', 'DOX', 'DSP', 'DT', 'DUOL', 'DV', 'EB', 'ECOM', 'EGHT', 'ENV', 'EPAY', 'ESTC', 'ETWO', 'ETWO-WT', 'EVBG', 'EVCM', 'EVOP', 'EVTC', 'FEYE', 'FIBK', 'FICO', 'FIVN', 'FOUR', 'FROG', 'FRSH', 'FSLY', 'FTNT', 'GBOX', 'GBTG', 'GDDY', 'GEGGL', 'GEHC', 'GEN', 'GSKY', 'GTLB', 'GTYH', 'GWRE', 'HCP', 'HMBL', 'HOOD', 'HUBS', 'IBEX', 'INTA', 'INTU', 'JAMF', 'JCOM', 'KIND', 'KLDI', 'LAW', 'LAZR', 'LSTR', 'LVOX', 'LVOXU', 'LYFT', 'MANH', 'MANT', 'MCFE', 'MDB', 'MDLA', 'MITK', 'MKTW', 'ML', 'MNTV', 'MODN', 'MQ', 'MSFT', 'MSP', 'MSTR', 'MTTR', 'MYPS', 'NATI', 'NCNO', 'NET', 'NEWR', 'NLOK', 'NOW', 'NRDY', 'NTCT', 'NTNX', 'NUAN', 'OKTA', 'OLO', 'ORCL', 'PAR', 'PATH', 'PAYA', 'PAYC', 'PAYO', 'PCTY', 'PD', 'PDFS', 'PEGA', 'PFPT', 'PING', 'PLAN', 'PLTR', 'PLUS', 'PRGS', 'PRO', 'PRSI', 'PTC', 'PUBM', 'PWSC', 'PYCR', 'QADA', 'QADB', 'QLYS', 'QTWO', 'R', 'RELY', 'RIOT', 'RMNIW', 'RNG', 'RPAY', 'RPD', 'RUM', 'RUMBW', 'RWWI', 'RXT', 'S', 'SBSAA', 'SCWX', 'SEMR', 'SHCR', 'SMAR', 'SMRT', 'SNCRL', 'SNOW', 'SNPS', 'SOUN', 'SPLK', 'SPSC', 'SPT', 'SQ', 'SQSP', 'STEM', 'STER', 'STMP', 'SUMO', 'SVMK', 'SWI', 'TASK', 'TOGL', 'TTD', 'TWKS', 'TYL', 'U', 'UBER', 'UPBD', 'VERX', 'VMEO', 'VMW', 'VRNS', 'VRNT', 'VRSN', 'VYGVF', 'WDAY', 'WEX', 'WK', 'WORK', 'XM', 'YEXT', 'ZEN', 'ZI', 'ZIXI', 'ZS', 'ZUO']

companies = Toolkit(symbols, api_key=API_KEY, quarterly=True, start_date="2010-01-01", sleep_timer=True, progress_bar=True, remove_invalid_tickers=False)

custom_ratios = {
    "WC / Net Income as %": "(Working Capital / Net Income) * 100",
    "Working Capital Ratio": "Working Capital / Revenue",
}

df_ratios = companies.ratios.collect_all_ratios()
df_custom = companies.ratios.collect_custom_ratios(custom_ratios_dict=custom_ratios)
df_stats = companies.get_statistics_statement()

df_stock_price = companies.get_historical_data(fill_nan=False, period="daily")

df_ratios_growth_TTM = companies.ratios.collect_all_ratios(growth=True, lag=[1, 2, 3, 4, 5], trailing=4)
df_ratios_growth_TTM.index = [df_ratios_growth_TTM.index.get_level_values(0), df_ratios_growth_TTM.index.get_level_values(1) + '_' + df_ratios_growth_TTM.index.get_level_values(2)]

Running the exact same code without QADB works fine.

JerBouma commented 7 months ago

I'll look into QADB. There are a bunch of exceptions I am slowly catching up to. It's hard to test all 50k tickers that are constantly expanded!

sword134 commented 7 months ago

I'll look into QADB. There are a bunch of exceptions I am slowly catching up to. It's hard to test all 50k tickers that are constantly expanded!

Ofcourse, thats understandable. But isnt the optimal solution implementing a way to handle these troublesome tickers?

JerBouma commented 7 months ago

Yes, every time a ticker like this pops up I also build in an error catcher for it to also deal with the tickers that face similar issues.

JerBouma commented 7 months ago

This is an interesting rare case where there is fundamental data available but no historical data. This is because QADB is for Class B Shares which ceased to exist somewhere in 2021. However, you can of course still obtain the financial statements of QAD Inc. with that ticker. The correct ticker would be QADA. I'll patch this up but for your information that is the reason it fails.

JerBouma commented 7 months ago

Issue resolved in v1.8.2, it will now work despite QADB not having any data attached to it. The ratios calculated will simply be based on an empty set (as also indicated by the fact there was no data retrieved).

image