WallStreetAnalytics / wallstreetanalytics

An endeavor to create an analytics tool to democratize the information hedge funds are creating teams to collect.
813 stars 30 forks source link

Automate SEC File Properties into .txt or .csv #25

Open pdeneka opened 3 years ago

pdeneka commented 3 years ago

The SEC has ~30 different forms in EDGAR with various fields that need to be translated into properties. The forms are also available in XML format.

We need to generate a class file for each form and the properties, constructors, etc.

It would be a massive help if you can automate those XML fields into C-whatever properties. File can be in .txt or .csv as you see fit, and naming schema should match file naming schema.

Bonus points if you can also automate generating the entire class with constructors, getters, setters, and updates ready for review and customization. (1 .txt/.csv AND 1 C-whatever class file)

This would also be a good learning project on how to: read and write files work in different data types (XML or HTML if you prefer) and automate workloads

You can read more about the different SEC filings here: https://www.sec.gov/forms You can access SEC EDGAR through the homepage here: https://www.sec.gov/edgar.shtml

pdeneka commented 3 years ago

BigBoiBri is working on this. I don't seem to be able to assign the task.

Iridium52 commented 3 years ago

EDGAR filing data is in XML format but most of the useful information is in the HTML (and XBRL), the filing properties are mostly Form types, filing number, period of report, etc.

pdeneka commented 3 years ago

correct @Iridium52!

Those Form Types, Filing Numbers, Period of Reports, etc are what we need. They will determine everything from data structure and relationships to how we build the respective object classes. I'd rather not code 30 classes and 30 database tables by hand if I don't have to, and Bri needed some work.