Open kfogel opened 7 years ago
The error output promised in the above commit was:
[...]
Done with Stage 1 (sanitizing).
Stage 2: Filtering excluded Review Numbers...
Done with Stage 2 (filtering excluded Review Numbers).
Stage 3: Adding some supplemental data...
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "-" already exists in Table. Column will be renamed to "-_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "/" already exists in Table. Column will be renamed to "/_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "/" already exists in Table. Column will be renamed to "/_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "about" already exists in Table. Column will be renamed to "about_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "100" already exists in Table. Column will be renamed to "100_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "and" already exists in Table. Column will be renamed to "and_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Change" already exists in Table. Column will be renamed to "Change_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "of" already exists in Table. Column will be renamed to "of_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Name,Principal" already exists in Table. Column will be renamed to "Name,Principal_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Point" already exists in Table. Column will be renamed to "Point_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "of" already exists in Table. Column will be renamed to "of_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Contact" already exists in Table. Column will be renamed to "Contact_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_4".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Point" already exists in Table. Column will be renamed to "Point_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "of" already exists in Table. Column will be renamed to "of_4".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Contact" already exists in Table. Column will be renamed to "Contact_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_5".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Point" already exists in Table. Column will be renamed to "Point_4".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "of" already exists in Table. Column will be renamed to "of_5".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Contact" already exists in Table. Column will be renamed to "Contact_4".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_6".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_7".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_8".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Legal" already exists in Table. Column will be renamed to "Legal_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Status" already exists in Table. Column will be renamed to "Status_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "-" already exists in Table. Column will be renamed to "-_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_9".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Legal" already exists in Table. Column will be renamed to "Legal_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Status" already exists in Table. Column will be renamed to "Status_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "-" already exists in Table. Column will be renamed to "-_4".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "-" already exists in Table. Column will be renamed to "-_5".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_10".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_11".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "-" already exists in Table. Column will be renamed to "-_6".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Other,Principal" already exists in Table. Column will be renamed to "Other,Principal_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_12".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_13".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "/" already exists in Table. Column will be renamed to "/_4".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_14".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_15".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_16".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "of" already exists in Table. Column will be renamed to "of_6".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "-" already exists in Table. Column will be renamed to "-_7".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_17".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Primary" already exists in Table. Column will be renamed to "Primary_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Area" already exists in Table. Column will be renamed to "Area_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "of" already exists in Table. Column will be renamed to "of_7".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Expertise" already exists in Table. Column will be renamed to "Expertise_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "-" already exists in Table. Column will be renamed to "-_8".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Notes,Principal" already exists in Table. Column will be renamed to "Notes,Principal_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_18".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Primary" already exists in Table. Column will be renamed to "Primary_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Thematic" already exists in Table. Column will be renamed to "Thematic_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Area" already exists in Table. Column will be renamed to "Area_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "of" already exists in Table. Column will be renamed to "of_8".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "-" already exists in Table. Column will be renamed to "-_9".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Dropdown,Principal" already exists in Table. Column will be renamed to "Dropdown,Principal_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_19".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Primary" already exists in Table. Column will be renamed to "Primary_4".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Thematic" already exists in Table. Column will be renamed to "Thematic_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Area" already exists in Table. Column will be renamed to "Area_4".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "of" already exists in Table. Column will be renamed to "of_9".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Proposed" already exists in Table. Column will be renamed to "Proposed_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Solution" already exists in Table. Column will be renamed to "Solution_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "-" already exists in Table. Column will be renamed to "-_10".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "and" already exists in Table. Column will be renamed to "and_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "and" already exists in Table. Column will be renamed to "and_4".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "of" already exists in Table. Column will be renamed to "of_10".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "and" already exists in Table. Column will be renamed to "and_5".
Row 2 has 456 values, but Table only has 160 columns.
sys:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='/home/kfogel/private/work/ots/clients/macfound/eval-system/data/filtered-100andchangeExport-all-judges.csv.tmp' mode='rt' encoding='utf-8'>
Done with Stage 3 (joining CSVs to add supplemental data).
Creating wiki...
Traceback (most recent call last):
File "/home/kfogel/private/work/ots/r/csv2wiki/csv2wiki", line 1157, in <module>
main()
File "/home/kfogel/private/work/ots/r/csv2wiki/csv2wiki", line 1120, in main
csv_in = CSVInput(args[0], config)
File "/home/kfogel/private/work/ots/r/csv2wiki/csv2wiki", line 975, in __init__
self.headers = [None,] + next(self._csv_reader)
StopIteration
Done creating wiki.
Okay, the ugly error above is solved by commit 84f312d. Now to solve the email address problem mentioned in the log message for commit 2c4d4aaf.
Important realization: whatever's causing the email-address doublement is not coming from fix-csv
(since that ran in an earlier stage); it must be from the mwclient library or from the MediaWiki API itself, in fact.
One solution would be just to try using parens instead of angle brackets. But maybe the real trick is to use entities for the angle brackets, instead of literal angle brackets.
Okay, fixed in commit 17e41ed. I think this issue is done now, but still needs testing on the full data set, and the production wikis need to be reloaded. Leaving the ticket open until all that's done.
The above commit ac1facb is on the 38-join-supplemental-data-squash
branch. That commit is a squash of all the previous commits for this issue on the 38-join-supplemental-data
branch; it's what will get merged to master.
Still can't close this. Using commit 082c1143eb on master
, if I run with --pare=30
in the sanitization step, then wiki-refresh
makes it through the big join:
csvjoin -c Review_Number \
"${DATA_DIR}"/tmp-"${STAGE_2_CSV}" \
"${DATA_DIR}"/contact-and-turndown-tmp.csv \
> "${DATA_DIR}"/"${STAGE_3_CSV}"
But if I run with no --pare
(as one would for a full production run), then that step pauses for an insanely long time -- several minutes -- before finally spewing these errors:
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "-" already exists in Table. Column will be renamed to "-_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "/" already exists in Table. Column will be renamed to "/_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "/" already exists in Table. Column will be renamed to "/_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "about" already exists in Table. Column will be renamed to "about_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "100" already exists in Table. Column will be renamed to "100_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "and" already exists in Table. Column will be renamed to "and_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Change" already exists in Table. Column will be renamed to "Change_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "of" already exists in Table. Column will be renamed to "of_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Name,Principal" already exists in Table. Column will be renamed to "Name,Principal_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Point" already exists in Table. Column will be renamed to "Point_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "of" already exists in Table. Column will be renamed to "of_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Contact" already exists in Table. Column will be renamed to "Contact_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_4".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Point" already exists in Table. Column will be renamed to "Point_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "of" already exists in Table. Column will be renamed to "of_4".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Contact" already exists in Table. Column will be renamed to "Contact_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_5".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Point" already exists in Table. Column will be renamed to "Point_4".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "of" already exists in Table. Column will be renamed to "of_5".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Contact" already exists in Table. Column will be renamed to "Contact_4".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_6".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_7".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_8".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Legal" already exists in Table. Column will be renamed to "Legal_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Status" already exists in Table. Column will be renamed to "Status_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "-" already exists in Table. Column will be renamed to "-_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_9".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Legal" already exists in Table. Column will be renamed to "Legal_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Status" already exists in Table. Column will be renamed to "Status_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "-" already exists in Table. Column will be renamed to "-_4".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "-" already exists in Table. Column will be renamed to "-_5".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_10".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_11".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "-" already exists in Table. Column will be renamed to "-_6".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Other,Principal" already exists in Table. Column will be renamed to "Other,Principal_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_12".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_13".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "/" already exists in Table. Column will be renamed to "/_4".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_14".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_15".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_16".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "of" already exists in Table. Column will be renamed to "of_6".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "-" already exists in Table. Column will be renamed to "-_7".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_17".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Primary" already exists in Table. Column will be renamed to "Primary_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Area" already exists in Table. Column will be renamed to "Area_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "of" already exists in Table. Column will be renamed to "of_7".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Expertise" already exists in Table. Column will be renamed to "Expertise_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "-" already exists in Table. Column will be renamed to "-_8".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Notes,Principal" already exists in Table. Column will be renamed to "Notes,Principal_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_18".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Primary" already exists in Table. Column will be renamed to "Primary_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Thematic" already exists in Table. Column will be renamed to "Thematic_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Area" already exists in Table. Column will be renamed to "Area_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "of" already exists in Table. Column will be renamed to "of_8".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "-" already exists in Table. Column will be renamed to "-_9".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Dropdown,Principal" already exists in Table. Column will be renamed to "Dropdown,Principal_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_19".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Primary" already exists in Table. Column will be renamed to "Primary_4".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Thematic" already exists in Table. Column will be renamed to "Thematic_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Area" already exists in Table. Column will be renamed to "Area_4".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "of" already exists in Table. Column will be renamed to "of_9".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Proposed" already exists in Table. Column will be renamed to "Proposed_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Solution" already exists in Table. Column will be renamed to "Solution_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "-" already exists in Table. Column will be renamed to "-_10".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "and" already exists in Table. Column will be renamed to "and_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "and" already exists in Table. Column will be renamed to "and_4".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "of" already exists in Table. Column will be renamed to "of_10".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "and" already exists in Table. Column will be renamed to "and_5".
Row 2 has 405 values, but Table only has 160 columns.
Not sure where the problem is yet. We had these kinds of errors before, and that was solved in commit 84f312d, as noted here. So why is it reappearing?
If there's some issue in a particular row(s) of the data, trying different (relatively prime) --pare
values will zero in on it.
(Well, the above error gives a clue anyway -- it says "Row 2". So that's the place to start, in the previous stage's CSV output.)
ZOMG. Okay: I think it's a bug in Python agate's CSV sniffing functionality, at least with agate 1.6.0 in Python 3.5.
With --pare=19 the error reproduces every time:
Row 3 has 227 values, but Table only has 160 columns.
(Full error transcript given later.)
But if you add --snifflimit=0
(which disables sniffing) to the final csvjoin
call in wiki-refresh
, the error doesn't happen. So far the output looks fine, at least from spot-checking the CSV file and the resultant pages in my localhost wiki.
Interestingly, if I pass --snifflimit=30000000
(that is, far more bytes than the 2576441 bytes of the input CSV file, tmp-filtered-100andchangeExport-all-judges.csv
), the error still happens. So the issue isn't the limit, it's the sniffing functionality itself. This is consistent with the fact that we see it with some --pare
values (including but not limited to no paring) yet not others: if sniffing is broken in some interesting way, then the reproducibility of that breakage may depend on what exactly is in the sniff window. (This might even explain why --pare=37
gets intermittent errors, if there's a random-selection component to how agate.Table.from_csv() does sniffing.)
I traced this far by adding the --verbose
flag to the second csvjoin
invocation. The resultant error in full (again, this is with --pare=19
) was:
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "-" already exists in Table. Column will be renamed to "-_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "/" already exists in Table. Column will be renamed to "/_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "/" already exists in Table. Column will be renamed to "/_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "about" already exists in Table. Column will be renamed to "about_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "100" already exists in Table. Column will be renamed to "100_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "and" already exists in Table. Column will be renamed to "and_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Change" already exists in Table. Column will be renamed to "Change_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "of" already exists in Table. Column will be renamed to "of_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Name,Principal" already exists in Table. Column will be renamed to "Name,Principal_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Point" already exists in Table. Column will be renamed to "Point_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "of" already exists in Table. Column will be renamed to "of_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Contact" already exists in Table. Column will be renamed to "Contact_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_4".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Point" already exists in Table. Column will be renamed to "Point_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "of" already exists in Table. Column will be renamed to "of_4".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Contact" already exists in Table. Column will be renamed to "Contact_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_5".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Point" already exists in Table. Column will be renamed to "Point_4".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "of" already exists in Table. Column will be renamed to "of_5".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Contact" already exists in Table. Column will be renamed to "Contact_4".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_6".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_7".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_8".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Legal" already exists in Table. Column will be renamed to "Legal_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Status" already exists in Table. Column will be renamed to "Status_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "-" already exists in Table. Column will be renamed to "-_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_9".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Legal" already exists in Table. Column will be renamed to "Legal_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Status" already exists in Table. Column will be renamed to "Status_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "-" already exists in Table. Column will be renamed to "-_4".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "-" already exists in Table. Column will be renamed to "-_5".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_10".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_11".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "-" already exists in Table. Column will be renamed to "-_6".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Other,Principal" already exists in Table. Column will be renamed to "Other,Principal_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_12".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_13".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "/" already exists in Table. Column will be renamed to "/_4".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_14".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_15".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_16".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "of" already exists in Table. Column will be renamed to "of_6".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "-" already exists in Table. Column will be renamed to "-_7".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_17".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Primary" already exists in Table. Column will be renamed to "Primary_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Area" already exists in Table. Column will be renamed to "Area_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "of" already exists in Table. Column will be renamed to "of_7".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Expertise" already exists in Table. Column will be renamed to "Expertise_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "-" already exists in Table. Column will be renamed to "-_8".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Notes,Principal" already exists in Table. Column will be renamed to "Notes,Principal_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_18".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Primary" already exists in Table. Column will be renamed to "Primary_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Thematic" already exists in Table. Column will be renamed to "Thematic_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Area" already exists in Table. Column will be renamed to "Area_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "of" already exists in Table. Column will be renamed to "of_8".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "-" already exists in Table. Column will be renamed to "-_9".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Dropdown,Principal" already exists in Table. Column will be renamed to "Dropdown,Principal_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_19".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Primary" already exists in Table. Column will be renamed to "Primary_4".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Thematic" already exists in Table. Column will be renamed to "Thematic_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Area" already exists in Table. Column will be renamed to "Area_4".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "of" already exists in Table. Column will be renamed to "of_9".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Proposed" already exists in Table. Column will be renamed to "Proposed_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Solution" already exists in Table. Column will be renamed to "Solution_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "-" already exists in Table. Column will be renamed to "-_10".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "and" already exists in Table. Column will be renamed to "and_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "and" already exists in Table. Column will be renamed to "and_4".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "of" already exists in Table. Column will be renamed to "of_10".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "and" already exists in Table. Column will be renamed to "and_5".
Traceback (most recent call last):
File "/usr/local/bin/csvjoin", line 11, in <module>
load_entry_point('csvkit==1.0.3', 'console_scripts', 'csvjoin')()
File "/usr/local/lib/python3.5/dist-packages/csvkit-1.0.3-py3.5.egg/csvkit/utilities/csvjoin.py", line 113, in launch_new_instance
File "/usr/local/lib/python3.5/dist-packages/csvkit-1.0.3-py3.5.egg/csvkit/cli.py", line 114, in run
File "/usr/local/lib/python3.5/dist-packages/csvkit-1.0.3-py3.5.egg/csvkit/utilities/csvjoin.py", line 64, in main
File "/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/table/from_csv.py", line 88, in from_csv
File "/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/table/__init__.py", line 124, in __init__
ValueError: Row 3 has 227 values, but Table only has 160 columns.
Done with Stage 3 (joining CSVs to add supplemental data).
Creating wiki...
Traceback (most recent call last):
File "/home/kfogel/private/work/ots/r/csv2wiki/csv2wiki", line 1157, in <module>
main()
File "/home/kfogel/private/work/ots/r/csv2wiki/csv2wiki", line 1120, in main
csv_in = CSVInput(args[0], config)
File "/home/kfogel/private/work/ots/r/csv2wiki/csv2wiki", line 975, in __init__
self.headers = [None,] + next(self._csv_reader)
StopIteration
Done creating wiki.
Oh looky, still not done. Want to know why?
Because apparently the default join is the wrong kind: any row whose review number is not one of the keys in the supplemental data is now being omitted from the final output. This is clearly bad; it should be fixable with a different kind of join and a default value of empty (or, in the case of Reason_For_Turndown
, preservation of the existing value if any).
More soon.
In the meantime, you can see the problem by running this in the DATA_DIR:
$ for name in *100andchangeExport-all-judges.csv
do
echo "${name}:"
csvstat --count ${name}
echo ""
done
Here's the output:
100andchangeExport-all-judges.csv:
Row count: 1904
filtered-100andchangeExport-all-judges.csv:
Row count: 1885
joined-100andchangeExport-all-judges.csv:
Row count: 1082
processed-100andchangeExport-all-judges.csv:
Row count: 1082
sanitized-100andchangeExport-all-judges.csv:
Row count: 1904
tmp-filtered-100andchangeExport-all-judges.csv:
Row count: 1885
Just to make the progression easier to see, I'm using this script now:
#!/bin/sh
for name in 100andchangeExport-all-judges.csv \
sanitized-100andchangeExport-all-judges.csv \
filtered-100andchangeExport-all-judges.csv \
tmp-filtered-100andchangeExport-all-judges.csv \
joined-100andchangeExport-all-judges.csv \
processed-100andchangeExport-all-judges.csv \
; do
echo "${name}:"
csvstat --count ${name}
echo ""
done
The jump down from 1904 to 1885 is legit -- it's just because of the exclusions list. It's the jump from 1885 to 1082 that shouldn't be happening.
Okay, commit 9ec4b92 should fix it. Leaving this issue open until we've had a chance to reload the production wikis, though.
Here's the new output from that inspection scriptlet:
100andchangeExport-all-judges.csv:
Row count: 1904
sanitized-100andchangeExport-all-judges.csv:
Row count: 1904
filtered-100andchangeExport-all-judges.csv:
Row count: 1885
tmp-filtered-100andchangeExport-all-judges.csv:
Row count: 1885
joined-100andchangeExport-all-judges.csv:
Row count: 1885
processed-100andchangeExport-all-judges.csv:
Row count: 1885
Since there are exactly 19 excluded proposals (see ${DATA_DIR}/excluded-review-numbers.txt
), these numbers work out perfectly. So far, manual inspection also checks out too. For example, proposal #8017 isn't mentioned in any of the supplemental data, but it still appears in the final spreadsheet, and has, as expected, two empty fields at the end, Participant_Email
and Reason_For_Turndown
, right after the Pitch_Video_Link
.
We need to add the primary contact info and "reason for turndown" info to the wiki, using the supplementary CSV files recently received from MacFound (
Principal-contact-join-20170716.csv
andReason-for-Turndown-join-2017-07-16.csv
).The way to do this is by joining it to
100andchangeExport-all-judges.csv
(or one of its derivatives) using csvkit. E.g., something like:The similar join process with
Reason-for-Turndown-join-2017-07-16.csv
will be a little more complex, because there is already a "Reason for Turndown" column in100andchangeExport-all-judges.csv
, it's just that in many cases it's empty; basically, the supplementary CSV should just replace that whole column.These joins would go into
wiki-refresh
, probably right after the filtering step. (Again, though, check what happens when the number of rows is not the same between the two CSVs in a given join, and adjust accordingly.)Then the csv2wiki-config file will have to have its col_map adjusted, of course.