dgunning / edgartools

Python library for working with SEC Edgar
MIT License
324 stars 70 forks source link

Accessing 'ITEM 1A': {'Title': 'Risk Factors',} #59

Closed mmistroni closed 3 weeks ago

mmistroni commented 1 month ago

Hello this is not an issue but i would like to know how to access Risk Factors out of the XBRL? i cannot find any methods to do that?

Kind regards marco

dgunning commented 1 month ago

Risk Factors from the XBRL or Risk Factors from the 10-K/10-Q

mmistroni commented 1 month ago

Hi, From 10-K / 10-Q, but i believe the commentaries are not included in XBRL so i guess i have to find a better way to get it.. and thinking about it , if i could have access to the html document i could try to strip out htmls and look for specific sections? thanks. let me know if i can help at all as your code is really great!

vishwasg217 commented 1 month ago

Looking for something similar. Would love to have a feature to directly access each item's content in 10-K report, preferably in html/markdown

mmistroni commented 1 month ago

Hello one thing you can try - i tried with form 8-K and DEF 14A - is to get the text() from the filing and look for specific sections. get the substring and give it 'to the bot' to extract it..... something like the code below

-i havent treid with 10K as 10K text is much longer and convoluted.... i'll try this week but if you have time pls give it a go and let me know

hth Marco

company = Company("WMT") proxies = company.get_filings(form="DEF 14A") random_proxy = proxies[0] ## there is too much info. so to narrow downt the scope we will look for well defined sections txt = random_proxy.text() start = txt.find('EXECUTIVE COMPENSATION TABLES') exec_section_start = txt[start:] end = exec_section_start.find('Deferred Compensation Plans') # This normally comes after the compensation table mgmt_text = exec_section_start[0:end] #print(mgmt_text) response = model.generate_content(f"Can you extract the names of persons listed in this text:{mgmt_text}") print(response.text)

On Sun, Jun 9, 2024 at 2:18 PM Vishwas Gowda @.***> wrote:

Looking for something similar. Would love to have a feature to directly access each item's content in 10-K report, preferably in html/markdown

— Reply to this email directly, view it on GitHub https://github.com/dgunning/edgartools/issues/59#issuecomment-2156603895, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACPNCDWOJKAVFZPBSSBJW7LZGRI2HAVCNFSM6AAAAABI2DOLHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJWGYYDGOBZGU . You are receiving this because you authored the thread.Message ID: @.***>

mmistroni commented 4 weeks ago

Vishwas, there's a way

you can get the full text submission from here

company = Company("WMT") tenks = company.get_filings(form="10-K") tenks[0].full_text_submission()

i wrote code some time ago to parse 10k , identify sections and extract sentiment give me couple of days to resurrect it and i'll repost to this threada. it's nothing too complicated,just some regexes etc

On Sun, Jun 9, 2024 at 6:15 PM Sofia’s World @.***> wrote:

Hello one thing you can try - i tried with form 8-K and DEF 14A - is to get the text() from the filing and look for specific sections. get the substring and give it 'to the bot' to extract it..... something like the code below

-i havent treid with 10K as 10K text is much longer and convoluted.... i'll try this week but if you have time pls give it a go and let me know

hth Marco

company = Company("WMT") proxies = company.get_filings(form="DEF 14A") random_proxy = proxies[0] ## there is too much info. so to narrow downt the scope we will look for well defined sections txt = random_proxy.text() start = txt.find('EXECUTIVE COMPENSATION TABLES') exec_section_start = txt[start:] end = exec_section_start.find('Deferred Compensation Plans') # This normally comes after the compensation table mgmt_text = exec_section_start[0:end] #print(mgmt_text) response = model.generate_content(f"Can you extract the names of persons listed in this text:{mgmt_text}") print(response.text)

On Sun, Jun 9, 2024 at 2:18 PM Vishwas Gowda @.***> wrote:

Looking for something similar. Would love to have a feature to directly access each item's content in 10-K report, preferably in html/markdown

— Reply to this email directly, view it on GitHub https://github.com/dgunning/edgartools/issues/59#issuecomment-2156603895, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACPNCDWOJKAVFZPBSSBJW7LZGRI2HAVCNFSM6AAAAABI2DOLHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJWGYYDGOBZGU . You are receiving this because you authored the thread.Message ID: @.***>

dgunning commented 4 weeks ago

Try

wmt = Company("WMT")
tenk = wmt.get_filings(form="10-K").latest(1).obj()
tenk["Item 1A"]
mmistroni commented 4 weeks ago

Thanks, that works perfectly... should have tried myself before making any noise :(

On Mon, Jun 10, 2024 at 12:39 PM Dwight Gunning @.***> wrote:

Try

wmt = Company("WMT") tenk = wmt.get_filings(form="10-K").latest(1).obj() tenk["Item 1A"]

— Reply to this email directly, view it on GitHub https://github.com/dgunning/edgartools/issues/59#issuecomment-2158110949, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACPNCDVA54TR6FVR47NADG3ZGWF6TAVCNFSM6AAAAABI2DOLHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJYGEYTAOJUHE . You are receiving this because you authored the thread.Message ID: @.***>

dgunning commented 3 weeks ago

Closed as fixed