Closed david08-08 closed 2 months ago
It is hard to code this in a standard way so you will have to treat this a s one-off for Apple.
For example the name is "EarningsPerShareTables" for Apple but it will be different name for other companies.
Also I found the location of the html inside the statement at nested_table = earnings_per_share_df.iloc[1,1]
This code tries two ways to extract the html .. neither of which is perfect. This is exected - HTML is a display technology not a data container.
from io import StringIO
import pandas as pd
from edgar.documents import HtmlDocument
# Earnings per share
earnings_per_share = xbrl.statements.get('EarningsPerShareTables')
earnings_per_share_df = earnings_per_share.get_dataframe().reset_index()
earnings_per_share_df
# Nested table
nested_table = earnings_per_share_df.iloc[1,1]
def get_tables(text:str):
# Use pandas read_html
return pd.read_html(StringIO(text))
def get_tables2(text):
# Use HtmlDocument
doc = HtmlDocument.from_html(text)
return [
block.to_dataframe()
for block in
doc.get_table_blocks()
]
# Use pandas read_html
get_tables(nested_table)[0]
# Use HtmlDocument
get_tables2(nested_table)[0]
Thank you, Dwight. After running the code you provided it is still coming over like this. Have you tried this code on your end as well? If so did it work for you? Thank you.
Hi Dwight, sorry for the confusion this worked. Thank you for help. Just one question why do you think SegmentInfoandGeographi is not working? I assumed any tables I would be able to pull with the code provided. Is that not true?
I figured this out. The code you provide worked. Thank you Dwight.
Good afternoon I would like to know if it is possible to import the tables to the footnotes into excel. As an example the EarningsPerShare statement I have included a screen shot of how it is coming into excel.
The code I am using is below. I am trying to figure this out any help would be greatly appreciated. Thank you.
from edgar import from edgar.financials import from edgar.xbrl import * from pandas import ExcelWriter from pathlib import Path import pandas as pd from bs4 import BeautifulSoup
def extract_and_clean_tables(html_string): """ Function to extract and clean tables from the HTML content. """ soup = BeautifulSoup(html_string, 'html.parser') tables = soup.find_all('table') cleaned_tables = []
def write_statements(xbrl: XBRLData): output_path = Path('/Users/Documents/Apple Financials Test.xlsx') with output_path.open('wb') as f: with ExcelWriter(f, engine='xlsxwriter') as writer: workbook = writer.book number_format = workbook.add_format({'num_format': '0'}) # or '0.00' for two decimal places
Example usage:
if name == 'main': filing = Company("AAPL").get_filings(form="10-K").latest(1) xbrl = filing.xbrl() write_statements(xbrl)