catalyst-cooperative / mozilla-sec-eia

Exploratory development for SEC to EIA linkage
MIT License
0 stars 0 forks source link

Read in CIK as string instead of int in the 10K metadata #20

Open katie-lamb opened 2 months ago

katie-lamb commented 2 months ago

I'm in the process of creating a training dataset and am realizing that it would be nice to have a primary key for the SEC 10K filing archive that refers to each filing uniquely. It seems like CIK is just the ID for a filing company, not a primary key for the filings. A few questions:

Screenshot 2024-04-24 at 3 21 30 PM Screenshot 2024-04-24 at 11 43 23 AM
katie-lamb commented 1 month ago

In the EDGAR database there may be duplicate filings for a company in the same year quarter and this is not a problem with the archiver (because maybe they've refiled or resubmitted). What we've done with FERC is take the most recent filing for each company and year quarter, this might be what we do for the 10K's too.

Seems like the only real issue is to make the CIK a string instead of an int.