avivaprins / US-political-donations

MIT License
0 stars 1 forks source link

Iterative Cleaning: Part 1 (of n) #2

Open avivaprins opened 4 years ago

avivaprins commented 4 years ago

Don't forget to keep track of what gets filtered!

Step 1: determine the initial things we need to filter out (like loans and campaign expenditures).

avivaprins commented 4 years ago

General

Transactions

Candidates There isn't much to clean here, as we are considering using most of this information and the formatting is consistent.

Committees

avivaprins commented 4 years ago

@SaiArrow, can you look into the first two bullet points before our meeting on Friday?

General

* The FEC data are separated by | (instead of commas or tabs). How do we want to store our data? D3 can handle many import types, but are there any efficiency pros/cons? Our three main choices are `.txt` (| separated, as it is now), `.tsv` (tab separated), or `.json`.

* As mentioned below, we need to get a sense of acceptable scale - how big should these files be for reasonable processing speeds?
avivaprins commented 4 years ago

Update based on meeting, 10/30: we are not going to be preserving memos or any other attributes in transactions besides those mentioned above.

Therefore, the goal for this step in the cleaning is to aggregate transaction information and remove unused information.

avivaprins commented 3 years ago

TODO: