WallStreetAnalytics / wallstreetanalytics

An endeavor to create an analytics tool to democratize the information hedge funds are creating teams to collect.
813 stars 30 forks source link

Mining EDGAR #6

Open ebolyen opened 3 years ago

ebolyen commented 3 years ago

EDGAR maintains indices of all SEC filings, which you can find documentation for here: https://www.sec.gov/edgar/searchedgar/accessing-edgar-data.htm

Something I have learned is you can append -index.html to the end of a CIK which will give you a far more parse-able HTML file than the SGML you get as a reference from one of the indices (also I trust that SGML about as far as I can throw it, it contains PDFs and other blobs in it... so it's not far). That said, parsing the SGML would give you the contents of the index with a single download.

example line from an index:

10-K        3COM CORP                                                     738076      2000-08-17  edgar/data/738076/0001005477-00-005922.txt          

URL-hacked index: https://www.sec.gov/Archives/edgar/data/738076/0001005477-00-005922-index.html


disclaimer, I have no idea what this company is, I just grabbed a random line with a 10-K filing

michael-watson commented 3 years ago

@itsclaireh is it possible to get added to this repo and maybe make assignment stuff for this? I would like to explore exposing this data

DrewMcArthur commented 3 years ago

@itsclaireh is it possible to get added to this repo and maybe make assignment stuff for this? I would like to explore exposing this data

@michael-watson you should be able to fork the repo, and once this gets off the ground and has some organizational stuff setup, then you could file a pull request!

pdeneka commented 3 years ago

EDGAR is a subset of https://github.com/TheWallStreetAnalytics/wallstreetanalytics/issues/25 but could definitely use some help.