dgunning / edgartools

Navigate SEC Edgar data in Python
MIT License
478 stars 97 forks source link

Financials Extraction from XBRL #74

Open dgunning opened 3 months ago

dgunning commented 3 months ago

Version 3 of XBRL financials

  1. Rewrite Financial extraction from XBRL
  2. Create a comprehensive test harness
  3. Document Financials V3
dgunning commented 3 months ago

Copied from Issue 73 For some 10Q imports, some facts are missing when querying the facts table.

For example, in the latest 10Q (Q2 2024) for $GD, the 10Q contains rows for Costs of Products and Services (us-gaap:CostOfGoodsAndServicesSold) but this fact is never loaded into the facts table in Edgar-tools or in the income-statement printed.

Likewise for "us-gaap:InterestIncomeExpenseNet" and "us-gaap:OtherNonoperatingIncomeExpense" facts.

This may possibly be related to these fields having a role of "http://fasb.org/us-gaap/role/ref/legacyRef" while most of the facts that do get loaded have a role of "http://www.xbrl.org/2003/role/disclosureRef"

image

Progress so far

GDFinancials
dgunning commented 3 months ago

Note from https://github.com/emestee Hey,

If this helps, here are the entry points from the FASB taxonomy that group the line items in the mandatory filing statements:

https://xbrlview.fasb.org/yeti/resources/yeti-gwt/Yeti.jsp#tax~(id~174*v~10231)!con~(id~5267870)!net~(a~3474*l~832)!lang~(code~en-us)!path~(g~99043*p~0)!rg~(rg~32*p~12)

dgunning commented 2 months ago

@emestee what do you know about standardized statements vs as-reported statements? Do you know what defines the standard concepts that all companies include in their statements?

amitgandhinz commented 2 months ago

circling back here as ive been playing with the new upgrades. Thanks for this - looks like a lot of work went into the rewrite.

Were you able to pull in the productMembers as referenced here https://github.com/dgunning/edgartools/issues/66#issuecomment-2243569825

I have been trying to pull in the concepts that feed into the Revenue Sales, but still can't figure that out correctly.

The is the code snippet I have (eg for the AAPL XBRL instance)

# ----
                # Extract detailed revenue items
                # ----
                revenue_sources = []

                rev_dimensions = instance.dimensions
                rev_dimension_value = rev_dimensions['srt:ProductOrServiceAxis']
                facts = rev_dimension_value.get_facts()

                period_date_str = latest_date.strftime('%Y-%m-%d')

                # get the facts for this latest period
                latest_period_facts = facts[facts['end_date'] == period_date_str][facts["duration"] == "3 months"]

                for index, row in latest_period_facts.iterrows():
                    if row.concept.startswith("us-gaap:Revenue"):
                        print(row.value, row.concept, row.dimensions)

Running this against AAPL Q2 10Q, I get the following facts:

61564000000 us-gaap:RevenueFromContractWithCustomerExcludingAssessedTax {'srt:ProductOrServiceAxis': 'us-gaap:ProductMember'}
24213000000 us-gaap:RevenueFromContractWithCustomerExcludingAssessedTax {'srt:ProductOrServiceAxis': 'us-gaap:ServiceMember'}
39296000000 us-gaap:RevenueFromContractWithCustomerExcludingAssessedTax {'srt:ProductOrServiceAxis': 'aapl:IPhoneMember'}
7009000000 us-gaap:RevenueFromContractWithCustomerExcludingAssessedTax {'srt:ProductOrServiceAxis': 'aapl:MacMember'}
7162000000 us-gaap:RevenueFromContractWithCustomerExcludingAssessedTax {'srt:ProductOrServiceAxis': 'aapl:IPadMember'}
8097000000 us-gaap:RevenueFromContractWithCustomerExcludingAssessedTax {'srt:ProductOrServiceAxis': 'aapl:WearablesHomeandAccessoriesMember'}

The problem is that the us-gaap:ProductMember actually represents the total of all the individual product lines, so I end up double counting the product values.

Like, how do you determine that the apple:* members are part of the productMember, while the serviceMember is on its own without any nesting?

I want this product breakdown to work for any 10-Q so I can present where a companies incoming revenue comes from.

dgunning commented 2 months ago

You can see the dimensions using the dimensions attribute

instance_dimensions

And you can query by dimensions

query_facts
emestee commented 2 months ago

@emestee what do you know about standardized statements vs as-reported statements? Do you know what defines the standard concepts that all companies include in their statements?

I actually don't know, I imagine it is SEC regulation derived from federal law and incorporating FASB rules. I also don't think it should matter. SEC filings are validated upon submission and can be assumed to be compliant with technical requirements (otherwise the SEC parser will reject the filing). You should not assign any specific meaning to any items in the statement, other than their relationships to parent items, if any.

dgunning commented 2 months ago

I think I want to add a parameter that switches between as_reported - the rows and labels that the company wants to show and standard - a common set of values and labels that all companies are required to report.

I think Bloomberg operates like that no?

amitgandhinz commented 2 months ago

thats kind of what im doing where I use this library to pull in the data but then standardize things into my own fields.

You prev had a version of that in your old income statement code, but the challenge is getting a mapping of all the fields into something.

amitgandhinz commented 2 months ago

You can see the dimensions using the dimensions attribute instance_dimensions

And you can query by dimensions

query_facts

So with this, I am already getting the aapl:* dimensions. But see how there is also the 'us-gaap: ProductMember' in the srt:ProductOrServiceAxis. This ProductMember happens to be the sum of all the aapl dimensions. Meanwhile the ServicesMember in this example doesn't have sub items. Is there a way to know if a dimension is a total of other sub items (when you pull up the xbrl viewer on the sec site the items get indented so I assume the info is somewhere).

My use case is to be able to pull the product revenue sources generically for all 10K/10Q imports, so not necessarily Apple specific.

Ahmedmagdy31 commented 1 month ago

thats kind of what im doing where I use this library to pull in the data but then standardize things into my own fields.

You prev had a version of that in your old income statement code, but the challenge is getting a mapping of all the fields into something.

I'm struggling in this standardization now, would you please share the approach or some code to give me an idea? --I'm new to financial data and SEC but want to standardize financial statements for many companies. @amitgandhinz

Colem19 commented 1 month ago

The new version is working much better on the financials statements! Thanks a lot for that!

I think there is a small thing that could be highly improved in the cash flow statement. If I look at Apple, it seems like a lot of the lines are positive instead of being negative (like Share repurchases).

david08-08 commented 1 month ago

@Colem19 @dgunning The statements are great; Dwight has done a great job. You are right though, good eye on identifying this issue. As an example I pulled (ULTA)'s most recent 10-Q via the TENQ-Class and I have identified a few more line items that either should have a negative output , positive output or its in reverse order (meaning 2024 should be positive and 2023 should be a negative). @dgunning do you anticipate that this will be fixed or is something that is fixable?

Screenshot 2024-09-11 at 9 21 54 AM
david08-08 commented 1 month ago

@Colem19 @dgunning To be clear I highlighted in yellow what should be negative for both years. And I commented what should be reversed and what should be positive.

dgunning commented 4 weeks ago

I verified and most values are in the right direction

UltraBeautyCashFlow

Deferred Income Tax does not match the Filing table in the HTML.

The raw value does not match in the XBRL instance file but I suspect that there seems to be a bug in the XBRL calculation file.

UltraBeautyXbrl

The calculation files usually have weights of -1 for negative values but it is missing in this case.

weight

Not much the code can do in this case