Open avivaprins opened 4 years ago
General
.txt
(| separated, as it is now), .tsv
(tab separated), or .json
.Transactions
sum
, mean
, and standard deviation
of transaction amounts. Anything else?Candidates There isn't much to clean here, as we are considering using most of this information and the formatting is consistent.
Committees
@SaiArrow, can you look into the first two bullet points before our meeting on Friday?
General
* The FEC data are separated by | (instead of commas or tabs). How do we want to store our data? D3 can handle many import types, but are there any efficiency pros/cons? Our three main choices are `.txt` (| separated, as it is now), `.tsv` (tab separated), or `.json`. * As mentioned below, we need to get a sense of acceptable scale - how big should these files be for reasonable processing speeds?
Update based on meeting, 10/30: we are not going to be preserving memos or any other attributes in transactions besides those mentioned above.
Therefore, the goal for this step in the cleaning is to aggregate transaction information and remove unused information.
TODO:
Don't forget to keep track of what gets filtered!
Step 1: determine the initial things we need to filter out (like loans and campaign expenditures).