dgunning / edgartools

Navigate SEC Edgar data in Python
MIT License
516 stars 101 forks source link

Need help pulling 10 years of income statements, balance sheets, and cash flow statements #111

Open unparadise opened 2 months ago

unparadise commented 2 months ago

I am writing a script to pull 10 years of income statements, balance sheets, and cash flow statements based on a ticker parameter and encountered a few issues.

  1. I noticed that the dataframe created from financials.get_income_statement() has 4 empty rows on top. These are 'Income Statement [Abstract]', 'Statement [Table]', 'Product and Service', and 'Statement [Line Items]'. Below is my code.

    company = Company('aapl')
    ten_k = company.get_filings(form="10-K").latest(1).obj()
    financials = ten_k.financials
    income_statement_df = financials.get_income_statement().get_dataframe()
    print(income_statement_df)

    Should these empty rows be removed from the returned object?

  2. When I tried to pull 10 years of income statement of MSFT, I encountered an error that says 'ValueError: Length mismatch: Expected axis has 21 elements, new values have 22 elements'. Below is my code.

    
    def get_financial_statements(ticker):
    year = 10
    set_identiy("blah blah blah_@blah.com")
    company = Company('MSFT')
    
    def get_income_statements():
        ten_ks = company.get_filings(form="10-K").latest(year)
        income_statement_df = pd.DataFrame()
    
        income_statement_df = income_statement_df.iloc[:, :-2]
    
        i = 0
        for ten_k in ten_ks:
            financials = ten_k.obj().financials
            income _statement_df = financials.get_income_statement().get_dataframe()
            if (i == 0):
                income_statement_df = pd.concat([income_statement_df, income_statement_df[income_statement_df.columns]], axis=1)
        i = i + 1
        print(income_statement_df)
    
    get_income_statements()

def main(): get_financial_statements('aapl')

if name = 'main': main()

The full error message is pasted below.

Traceback (most recent call last): File "/Users/liangchen/Github/coding/stocks/get_fs_SEC.py", line 147, in main() File "/Users/liangchen/Github/coding/stocks/get_fs_SEC.py", line 143, in main get_financial_statements(args.ticker, statement) File "/Users/liangchen/Github/coding/stocks/get_fs_SEC.py", line 54, in get_financial_statements get_income_statements() File "/Users/liangchen/Github/coding/stocks/get_fs_SEC.py", line 45, in get_income_statements income_statements_df.index=income_statement_df.index ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/pandas/core/generic.py", line 6313, in setattr return object.setattr(self, name, value) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "properties.pyx", line 69, in pandas._libs.properties.AxisProperty.set File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/pandas/core/generic.py", line 814, in _set_axis self._mgr.set_axis(axis, labels) File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/pandas/core/internals/managers.py", line 238, in set_axis self._validate_set_axis(axis, new_labels) File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/pandas/core/internals/base.py", line 98, in _validate_set_axis raise ValueError( ValueError: Length mismatch: Expected axis has 21 elements, new values have 22 elements

Thank you for your help in advance!

dgunning commented 2 months ago

So I actually implemented this last week but have been slow rolling it out.

from edgar import *

company = Company("MSFT")
filings = company.get_filings(form="10-K").latest(9)

financials = MultiFinancials(filings)
financials.get_balance_sheet()
financials.get_cash_flow_statement()
financials.get_income_statement()
unparadise commented 2 months ago

Thank you for your prompt reply, dgunning! I tried it and it works. Thank you!

But I noticed that values are incorrect. For example, when pulling the 10 years income statements for MSFT, the revenue row shows <N/A> for 2015 and 2014.

image

Also, I noticed that many details such as R&D expense, SG&A expense are missing from the dataframe. Is this by design?

image

Colem19 commented 1 month ago

I think it has to do with the contactenation of the rows/dataframe. This might be due to lines having slightly different text from one year to another. There might be a way to do this by removing the index for each year and creating a new one. The rows order is kind of important in this process so I wonder what would be the best way to do it.

When I do Excel models, I actually add some rows as the accounts are used and kind of always kept them even if they were removed in later years. There might be some work around by doing some kind of sumif or using some specific mapping to map out the account and refer to the mapping. But it might be hard with all the different ways financials are presented.