Open bnsmith3 opened 7 years ago
Here's an example of a parseable OGE Form 278e Public Financial Disclosure Report (Steve Bannon: March 30, 2017). I believe this PDF format may be output from a process on the OGE's Integrity.gov website where everyone except Presidential candidates must file (candidates file at the FEC) (source). So this parseable PDF format is our primary target.
We have observed one other format for OGE Form 278e, which appears to be a scan of a primary source document (Michael Flynn: March 31, 2017). Depending on the frequency with which these documents are encountered and if they are parseable, it might be a secondary target.
President Barack Obama used the older OGE Form 278, and available documents appear to be scans (Barack Obama: May 12, 2016). It's unclear how many historical records we'll encounter that are coded to 278 rather than 278e.
Finally, here is a different format for a judicial branch employee from the Committee on Financial Disclosure in the Administrative Office of the United States Courts (Neil Gorsuch: August 11, 2016). This also appears to be a scan.
There is also an OGE Form 278-T for reporting periodic transactions (e.g., purchase and sale of stocks). Here is an example (Charles F. Bolden Jr.: Feb 2, 2016). I would argue that this parseable format is the next most important target because this is where potentially interesting and timely information could be uncovered.
re: Ethics Agreements, for example (Rex Tillerson: Jan 3, 2017). These documents appear to be business letters which are not in a standard parseable format. However, they do appear to contain important information so it might be good to figure out a way to point to these files.
There are at least two different types of files on the OGE site that could use parsing:
Right now, form 278 form can be parsed using this code, but it could be improved.
The output of a parsed file should be a json object that could be ingested by pretty much any other service or tool.