dgunning / edgartools

Python library for working with SEC Edgar
MIT License
324 stars 70 forks source link

Bug: Not Capturing 10-K Item 1 text #45

Closed hadoopjax closed 2 months ago

hadoopjax commented 2 months ago

I've been running some tests on "Item 1" extraction across multiple symbols and have found a few that don't get picked up in the code blow.

Code:

symbol = "PR" #EE also has this behavior if you need another test
filing = Company(symbol).get_filings(form="10-K").latest(1)

tenk = filing.obj()
output = tenk['Item 1']
dgunning commented 2 months ago

There are 2 different issues. The items for Permian Energy the items are mostly accurate except for the first item which says Items 1 and 2. Will have to figure out to handle that one

permian
filing = Filing(company='Permian Resources Corp', 
                cik=1658566, 
                form='10-K',
                filing_date='2024-02-29', 
                accession_no='0001658566-24-000018')
tenk=filing.obj()
tenk.items
['Item 1A',
 'Item 1B',
 'Item 1C',
 'Item 3',
 'Item 4',
 'Item 5',
 'Item 6',
 'Item 7',
 'Item 7A',
 'Item 8',
 'Item 9',
 'Item 9A',
 'Item 9B',
 'Item 9C',
 'Item 10',
 'Item 11',
 'Item 12',
 'Item 13',
 'Item 14',
 'Item 15',
 'Item 16']
dgunning commented 2 months ago

The second one for Excelerate Energy seems like a straightforward miss, so I will focus on that one first

dgunning commented 2 months ago

The Excelerate fix is done, and being tested

For Permian I'm not sure what to do about it using the current item detection method

items1and2
dgunning commented 2 months ago

Fixed in 2.21.1