Closed skabbit closed 6 years ago
I've found out, it is because of sql_mode, which is lead to error on ' ' dates.
Add c.execute("SET sql_mode = ''")
after line c = conn.cursor()
to prevent this error.
I think newer versions of mysql are stricter than the version that this script was originally developed with. Would prefer fixes to the queries instead of turning off strictness.
I was able to get past the issue by modifying line 153 to c.execute("INSERT IGNORE INTO contributions ". INSERT IGNORE will not insert records with invalid values.
Some of the data has extra columns in the addresses. For example:
3215737 Chico & Nunes, P.C. 333 West Wacker Drive, Suite 1650 333 West Wacker Drive Suite 1800 Chicago IL 60606 08/23/2010 0.0000 250.0000 Individual Contribution 07/01/2010 12/31/2010 Citizens for David E Miller 14923
3091186 Chico & Nunes, P.C. 333 W. Wacker Drive Suite 1650 Chicago IL 60606 07/15/2010 0.0000 1000.0000 Individual Contribution 07/01/2010 12/31/2010 Citizens for Maldonado 9533
Columns are separated by tabs. The first entry has a tab between the 2nd "333 West Wacker Drive" and "Suite 1800", so "Suite 1800" gets bumped to the raw_data.city
column of the DB, and every column after that is offset by one.
mysql> select * from raw_table WHERE reciept_id=3215737;
+------------+---------------------+------------+-----------------------------------+-----------------------+------------+---------+------+-------------+---------------+-------------+--------+--------------+-------------------------+------------+------------------+-------------------+------------------+------------------+-------------+--------------+------------+-------------+---------------+---------------+---------------------+-------------------+----------------+-----------------------------+
| reciept_id | last_name | first_name | address_1 | address_2 | city | state | zip | report_type | date_recieved | loan_amount | amount | receipt_type | employer | occupation | vendor_last_name | vendor_first_name | vendor_address_1 | vendor_address_2 | vendor_city | vendor_state | vendor_zip | description | election_type | election_year | report_period_begin | report_period_end | committee_name | committee_id |
+------------+---------------------+------------+-----------------------------------+-----------------------+------------+---------+------+-------------+---------------+-------------+--------+--------------+-------------------------+------------+------------------+-------------------+------------------+------------------+-------------+--------------+------------+-------------+---------------+---------------+---------------------+-------------------+----------------+-----------------------------+
| 3215737 | Chico & Nunes, P.C. | | 333 West Wacker Drive, Suite 1650 | 333 West Wacker Drive | Suite 1800 | Chicago | IL | 60606 | | 08/23/2010 | 0.0000 | 250.0000 | Individual Contribution | | | | | | | | | | | | | 07/01/2010 | 12/31/2010 | Citizens for David E Miller |
+------------+---------------------+------------+-----------------------------------+-----------------------+------------+---------+------+-------------+---------------+-------------+--------+--------------+-------------------------+------------+------------------+-------------------+------------------+------------------+-------------+--------------+------------+-------------+---------------+---------------+---------------------+-------------------+----------------+-----------------------------+
1 row in set (1.75 sec)
There are 6 columns like this. Since this is a demo the best answer might be to remove those rows from Illinois-campaign-contributions.txt
Or, alternately, run a DELETE FROM raw_table WHERE LENGTH(date_recieved)=0
once the import is complete.
Edit: It turns out some dates are also set to 0.0000
due to a missing tab character in the address section so DELETE FROM raw_table WHERE LENGTH(date_recieved) < 10
works better. We are deleting very few records in total with this statement - something like 10.
For now, some of data is not compatible with mysql_init_db.py script. Here is the traceback log of error:
Is it MySQL version problem? Or is it a because of downloaded data has changed?