OpenTechStrategies / torque-sites

Open source code specific to OTS-managed Torque sites (usually client sites).
3 stars 1 forks source link

Add primary contact info and "reason for turndown" info to wiki, using CSV join. #38

Open kfogel opened 7 years ago

kfogel commented 7 years ago

We need to add the primary contact info and "reason for turndown" info to the wiki, using the supplementary CSV files recently received from MacFound (Principal-contact-join-20170716.csv and Reason-for-Turndown-join-2017-07-16.csv).

The way to do this is by joining it to 100andchangeExport-all-judges.csv (or one of its derivatives) using csvkit. E.g., something like:

$ csvcut -c Review_Number,Participant,Email Principal-contact-join-20170716.csv \
    | some sed command to combine Participant and Email into an single email address \
    > tmpfile.csv

$ csvjoin -c Review_Number filtered-100andchangeExport-all-judges.csv tmpfile.csv > new-100andchangeExport.csv
# Putting the two CSV files in that order will result in the new column(s) being
# last.  Still need to determine what happens if the two CSV files don't have the
# same number of rows, but suspect it works the same way as SQL join.

The similar join process with Reason-for-Turndown-join-2017-07-16.csv will be a little more complex, because there is already a "Reason for Turndown" column in 100andchangeExport-all-judges.csv, it's just that in many cases it's empty; basically, the supplementary CSV should just replace that whole column.

These joins would go into wiki-refresh, probably right after the filtering step. (Again, though, check what happens when the number of rows is not the same between the two CSVs in a given join, and adjust accordingly.)

Then the csv2wiki-config file will have to have its col_map adjusted, of course.

kfogel commented 7 years ago

The error output promised in the above commit was:

[...]
Done with Stage 1 (sanitizing).

Stage 2: Filtering excluded Review Numbers...
Done with Stage 2 (filtering excluded Review Numbers).

Stage 3: Adding some supplemental data...
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "-" already exists in Table. Column will be renamed to "-_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "/" already exists in Table. Column will be renamed to "/_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "/" already exists in Table. Column will be renamed to "/_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "about" already exists in Table. Column will be renamed to "about_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "100" already exists in Table. Column will be renamed to "100_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "and" already exists in Table. Column will be renamed to "and_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Change" already exists in Table. Column will be renamed to "Change_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "of" already exists in Table. Column will be renamed to "of_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Name,Principal" already exists in Table. Column will be renamed to "Name,Principal_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Point" already exists in Table. Column will be renamed to "Point_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "of" already exists in Table. Column will be renamed to "of_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Contact" already exists in Table. Column will be renamed to "Contact_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_4".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Point" already exists in Table. Column will be renamed to "Point_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "of" already exists in Table. Column will be renamed to "of_4".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Contact" already exists in Table. Column will be renamed to "Contact_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_5".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Point" already exists in Table. Column will be renamed to "Point_4".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "of" already exists in Table. Column will be renamed to "of_5".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Contact" already exists in Table. Column will be renamed to "Contact_4".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_6".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_7".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_8".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Legal" already exists in Table. Column will be renamed to "Legal_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Status" already exists in Table. Column will be renamed to "Status_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "-" already exists in Table. Column will be renamed to "-_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_9".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Legal" already exists in Table. Column will be renamed to "Legal_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Status" already exists in Table. Column will be renamed to "Status_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "-" already exists in Table. Column will be renamed to "-_4".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "-" already exists in Table. Column will be renamed to "-_5".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_10".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_11".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "-" already exists in Table. Column will be renamed to "-_6".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Other,Principal" already exists in Table. Column will be renamed to "Other,Principal_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_12".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_13".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "/" already exists in Table. Column will be renamed to "/_4".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_14".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_15".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_16".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "of" already exists in Table. Column will be renamed to "of_6".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "-" already exists in Table. Column will be renamed to "-_7".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_17".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Primary" already exists in Table. Column will be renamed to "Primary_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Area" already exists in Table. Column will be renamed to "Area_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "of" already exists in Table. Column will be renamed to "of_7".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Expertise" already exists in Table. Column will be renamed to "Expertise_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "-" already exists in Table. Column will be renamed to "-_8".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Notes,Principal" already exists in Table. Column will be renamed to "Notes,Principal_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_18".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Primary" already exists in Table. Column will be renamed to "Primary_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Thematic" already exists in Table. Column will be renamed to "Thematic_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Area" already exists in Table. Column will be renamed to "Area_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "of" already exists in Table. Column will be renamed to "of_8".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "-" already exists in Table. Column will be renamed to "-_9".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Dropdown,Principal" already exists in Table. Column will be renamed to "Dropdown,Principal_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_19".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Primary" already exists in Table. Column will be renamed to "Primary_4".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Thematic" already exists in Table. Column will be renamed to "Thematic_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Area" already exists in Table. Column will be renamed to "Area_4".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "of" already exists in Table. Column will be renamed to "of_9".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Proposed" already exists in Table. Column will be renamed to "Proposed_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Solution" already exists in Table. Column will be renamed to "Solution_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "-" already exists in Table. Column will be renamed to "-_10".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "and" already exists in Table. Column will be renamed to "and_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "and" already exists in Table. Column will be renamed to "and_4".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "of" already exists in Table. Column will be renamed to "of_10".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "and" already exists in Table. Column will be renamed to "and_5".
Row 2 has 456 values, but Table only has 160 columns.
sys:1: ResourceWarning: unclosed file <_io.TextIOWrapper name='/home/kfogel/private/work/ots/clients/macfound/eval-system/data/filtered-100andchangeExport-all-judges.csv.tmp' mode='rt' encoding='utf-8'>
Done with Stage 3 (joining CSVs to add supplemental data).

Creating wiki...
Traceback (most recent call last):
  File "/home/kfogel/private/work/ots/r/csv2wiki/csv2wiki", line 1157, in <module>
    main()
  File "/home/kfogel/private/work/ots/r/csv2wiki/csv2wiki", line 1120, in main
    csv_in = CSVInput(args[0], config)
  File "/home/kfogel/private/work/ots/r/csv2wiki/csv2wiki", line 975, in __init__
    self.headers = [None,] + next(self._csv_reader)
StopIteration
Done creating wiki.
kfogel commented 7 years ago

Okay, the ugly error above is solved by commit 84f312d. Now to solve the email address problem mentioned in the log message for commit 2c4d4aaf.

kfogel commented 7 years ago

Important realization: whatever's causing the email-address doublement is not coming from fix-csv (since that ran in an earlier stage); it must be from the mwclient library or from the MediaWiki API itself, in fact.

One solution would be just to try using parens instead of angle brackets. But maybe the real trick is to use entities for the angle brackets, instead of literal angle brackets.

kfogel commented 7 years ago

Okay, fixed in commit 17e41ed. I think this issue is done now, but still needs testing on the full data set, and the production wikis need to be reloaded. Leaving the ticket open until all that's done.

kfogel commented 7 years ago

The above commit ac1facb is on the 38-join-supplemental-data-squash branch. That commit is a squash of all the previous commits for this issue on the 38-join-supplemental-data branch; it's what will get merged to master.

kfogel commented 7 years ago

Still can't close this. Using commit 082c1143eb on master, if I run with --pare=30 in the sanitization step, then wiki-refresh makes it through the big join:

csvjoin -c Review_Number                                              \
           "${DATA_DIR}"/tmp-"${STAGE_2_CSV}"                         \
           "${DATA_DIR}"/contact-and-turndown-tmp.csv                 \
         > "${DATA_DIR}"/"${STAGE_3_CSV}"

But if I run with no --pare (as one would for a full production run), then that step pauses for an insanely long time -- several minutes -- before finally spewing these errors:

/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "-" already exists in Table. Column will be renamed to "-_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "/" already exists in Table. Column will be renamed to "/_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "/" already exists in Table. Column will be renamed to "/_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "about" already exists in Table. Column will be renamed to "about_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "100" already exists in Table. Column will be renamed to "100_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "and" already exists in Table. Column will be renamed to "and_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Change" already exists in Table. Column will be renamed to "Change_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "of" already exists in Table. Column will be renamed to "of_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Name,Principal" already exists in Table. Column will be renamed to "Name,Principal_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Point" already exists in Table. Column will be renamed to "Point_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "of" already exists in Table. Column will be renamed to "of_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Contact" already exists in Table. Column will be renamed to "Contact_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_4".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Point" already exists in Table. Column will be renamed to "Point_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "of" already exists in Table. Column will be renamed to "of_4".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Contact" already exists in Table. Column will be renamed to "Contact_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_5".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Point" already exists in Table. Column will be renamed to "Point_4".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "of" already exists in Table. Column will be renamed to "of_5".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Contact" already exists in Table. Column will be renamed to "Contact_4".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_6".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_7".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_8".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Legal" already exists in Table. Column will be renamed to "Legal_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Status" already exists in Table. Column will be renamed to "Status_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "-" already exists in Table. Column will be renamed to "-_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_9".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Legal" already exists in Table. Column will be renamed to "Legal_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Status" already exists in Table. Column will be renamed to "Status_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "-" already exists in Table. Column will be renamed to "-_4".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "-" already exists in Table. Column will be renamed to "-_5".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_10".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_11".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "-" already exists in Table. Column will be renamed to "-_6".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Other,Principal" already exists in Table. Column will be renamed to "Other,Principal_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_12".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_13".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "/" already exists in Table. Column will be renamed to "/_4".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_14".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_15".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_16".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "of" already exists in Table. Column will be renamed to "of_6".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "-" already exists in Table. Column will be renamed to "-_7".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_17".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Primary" already exists in Table. Column will be renamed to "Primary_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Area" already exists in Table. Column will be renamed to "Area_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "of" already exists in Table. Column will be renamed to "of_7".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Expertise" already exists in Table. Column will be renamed to "Expertise_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "-" already exists in Table. Column will be renamed to "-_8".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Notes,Principal" already exists in Table. Column will be renamed to "Notes,Principal_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_18".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Primary" already exists in Table. Column will be renamed to "Primary_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Thematic" already exists in Table. Column will be renamed to "Thematic_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Area" already exists in Table. Column will be renamed to "Area_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "of" already exists in Table. Column will be renamed to "of_8".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "-" already exists in Table. Column will be renamed to "-_9".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Dropdown,Principal" already exists in Table. Column will be renamed to "Dropdown,Principal_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_19".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Primary" already exists in Table. Column will be renamed to "Primary_4".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Thematic" already exists in Table. Column will be renamed to "Thematic_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Area" already exists in Table. Column will be renamed to "Area_4".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "of" already exists in Table. Column will be renamed to "of_9".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Proposed" already exists in Table. Column will be renamed to "Proposed_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Solution" already exists in Table. Column will be renamed to "Solution_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "-" already exists in Table. Column will be renamed to "-_10".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "and" already exists in Table. Column will be renamed to "and_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "and" already exists in Table. Column will be renamed to "and_4".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "of" already exists in Table. Column will be renamed to "of_10".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "and" already exists in Table. Column will be renamed to "and_5".
Row 2 has 405 values, but Table only has 160 columns.    

Not sure where the problem is yet. We had these kinds of errors before, and that was solved in commit 84f312d, as noted here. So why is it reappearing?

If there's some issue in a particular row(s) of the data, trying different (relatively prime) --pare values will zero in on it.

kfogel commented 7 years ago

(Well, the above error gives a clue anyway -- it says "Row 2". So that's the place to start, in the previous stage's CSV output.)

kfogel commented 7 years ago

ZOMG. Okay: I think it's a bug in Python agate's CSV sniffing functionality, at least with agate 1.6.0 in Python 3.5.

With --pare=19 the error reproduces every time:

Row 3 has 227 values, but Table only has 160 columns.

(Full error transcript given later.)

But if you add --snifflimit=0 (which disables sniffing) to the final csvjoin call in wiki-refresh, the error doesn't happen. So far the output looks fine, at least from spot-checking the CSV file and the resultant pages in my localhost wiki.

Interestingly, if I pass --snifflimit=30000000 (that is, far more bytes than the 2576441 bytes of the input CSV file, tmp-filtered-100andchangeExport-all-judges.csv), the error still happens. So the issue isn't the limit, it's the sniffing functionality itself. This is consistent with the fact that we see it with some --pare values (including but not limited to no paring) yet not others: if sniffing is broken in some interesting way, then the reproducibility of that breakage may depend on what exactly is in the sniff window. (This might even explain why --pare=37 gets intermittent errors, if there's a random-selection component to how agate.Table.from_csv() does sniffing.)

I traced this far by adding the --verbose flag to the second csvjoin invocation. The resultant error in full (again, this is with --pare=19) was:

/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "-" already exists in Table. Column will be renamed to "-_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "/" already exists in Table. Column will be renamed to "/_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "/" already exists in Table. Column will be renamed to "/_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "about" already exists in Table. Column will be renamed to "about_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "100" already exists in Table. Column will be renamed to "100_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "and" already exists in Table. Column will be renamed to "and_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Change" already exists in Table. Column will be renamed to "Change_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "of" already exists in Table. Column will be renamed to "of_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Name,Principal" already exists in Table. Column will be renamed to "Name,Principal_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Point" already exists in Table. Column will be renamed to "Point_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "of" already exists in Table. Column will be renamed to "of_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Contact" already exists in Table. Column will be renamed to "Contact_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_4".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Point" already exists in Table. Column will be renamed to "Point_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "of" already exists in Table. Column will be renamed to "of_4".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Contact" already exists in Table. Column will be renamed to "Contact_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_5".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Point" already exists in Table. Column will be renamed to "Point_4".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "of" already exists in Table. Column will be renamed to "of_5".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Contact" already exists in Table. Column will be renamed to "Contact_4".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_6".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_7".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_8".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Legal" already exists in Table. Column will be renamed to "Legal_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Status" already exists in Table. Column will be renamed to "Status_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "-" already exists in Table. Column will be renamed to "-_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_9".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Legal" already exists in Table. Column will be renamed to "Legal_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Status" already exists in Table. Column will be renamed to "Status_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "-" already exists in Table. Column will be renamed to "-_4".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "-" already exists in Table. Column will be renamed to "-_5".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_10".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_11".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "-" already exists in Table. Column will be renamed to "-_6".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Other,Principal" already exists in Table. Column will be renamed to "Other,Principal_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_12".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_13".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "/" already exists in Table. Column will be renamed to "/_4".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_14".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_15".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_16".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "of" already exists in Table. Column will be renamed to "of_6".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "-" already exists in Table. Column will be renamed to "-_7".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_17".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Primary" already exists in Table. Column will be renamed to "Primary_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Area" already exists in Table. Column will be renamed to "Area_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "of" already exists in Table. Column will be renamed to "of_7".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Expertise" already exists in Table. Column will be renamed to "Expertise_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "-" already exists in Table. Column will be renamed to "-_8".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Notes,Principal" already exists in Table. Column will be renamed to "Notes,Principal_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_18".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Primary" already exists in Table. Column will be renamed to "Primary_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Thematic" already exists in Table. Column will be renamed to "Thematic_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Area" already exists in Table. Column will be renamed to "Area_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "of" already exists in Table. Column will be renamed to "of_8".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "-" already exists in Table. Column will be renamed to "-_9".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Dropdown,Principal" already exists in Table. Column will be renamed to "Dropdown,Principal_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Organization" already exists in Table. Column will be renamed to "Organization_19".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Primary" already exists in Table. Column will be renamed to "Primary_4".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Thematic" already exists in Table. Column will be renamed to "Thematic_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Area" already exists in Table. Column will be renamed to "Area_4".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "of" already exists in Table. Column will be renamed to "of_9".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Proposed" already exists in Table. Column will be renamed to "Proposed_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "Solution" already exists in Table. Column will be renamed to "Solution_2".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "-" already exists in Table. Column will be renamed to "-_10".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "and" already exists in Table. Column will be renamed to "and_3".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "and" already exists in Table. Column will be renamed to "and_4".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "of" already exists in Table. Column will be renamed to "of_10".
/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/utils.py:291: DuplicateColumnWarning: Column name "and" already exists in Table. Column will be renamed to "and_5".
Traceback (most recent call last):
  File "/usr/local/bin/csvjoin", line 11, in <module>
    load_entry_point('csvkit==1.0.3', 'console_scripts', 'csvjoin')()
  File "/usr/local/lib/python3.5/dist-packages/csvkit-1.0.3-py3.5.egg/csvkit/utilities/csvjoin.py", line 113, in launch_new_instance
  File "/usr/local/lib/python3.5/dist-packages/csvkit-1.0.3-py3.5.egg/csvkit/cli.py", line 114, in run
  File "/usr/local/lib/python3.5/dist-packages/csvkit-1.0.3-py3.5.egg/csvkit/utilities/csvjoin.py", line 64, in main
  File "/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/table/from_csv.py", line 88, in from_csv
  File "/usr/local/lib/python3.5/dist-packages/agate-1.6.0-py3.5.egg/agate/table/__init__.py", line 124, in __init__
ValueError: Row 3 has 227 values, but Table only has 160 columns.
Done with Stage 3 (joining CSVs to add supplemental data).

Creating wiki...
Traceback (most recent call last):
  File "/home/kfogel/private/work/ots/r/csv2wiki/csv2wiki", line 1157, in <module>
    main()
  File "/home/kfogel/private/work/ots/r/csv2wiki/csv2wiki", line 1120, in main
    csv_in = CSVInput(args[0], config)
  File "/home/kfogel/private/work/ots/r/csv2wiki/csv2wiki", line 975, in __init__
    self.headers = [None,] + next(self._csv_reader)
StopIteration
Done creating wiki.
kfogel commented 7 years ago

Oh looky, still not done. Want to know why?

Because apparently the default join is the wrong kind: any row whose review number is not one of the keys in the supplemental data is now being omitted from the final output. This is clearly bad; it should be fixable with a different kind of join and a default value of empty (or, in the case of Reason_For_Turndown, preservation of the existing value if any).

More soon.

In the meantime, you can see the problem by running this in the DATA_DIR:

$ for name in *100andchangeExport-all-judges.csv
   do
     echo "${name}:"
     csvstat --count ${name}
     echo ""
   done

Here's the output:

100andchangeExport-all-judges.csv:
Row count: 1904

filtered-100andchangeExport-all-judges.csv:
Row count: 1885

joined-100andchangeExport-all-judges.csv:
Row count: 1082

processed-100andchangeExport-all-judges.csv:
Row count: 1082

sanitized-100andchangeExport-all-judges.csv:
Row count: 1904

tmp-filtered-100andchangeExport-all-judges.csv:
Row count: 1885
kfogel commented 7 years ago

Just to make the progression easier to see, I'm using this script now:

#!/bin/sh

for name in 100andchangeExport-all-judges.csv               \
            sanitized-100andchangeExport-all-judges.csv     \
            filtered-100andchangeExport-all-judges.csv      \
            tmp-filtered-100andchangeExport-all-judges.csv  \
            joined-100andchangeExport-all-judges.csv        \
            processed-100andchangeExport-all-judges.csv     \
; do
  echo "${name}:"
  csvstat --count ${name}
  echo ""
done

The jump down from 1904 to 1885 is legit -- it's just because of the exclusions list. It's the jump from 1885 to 1082 that shouldn't be happening.

kfogel commented 7 years ago

Okay, commit 9ec4b92 should fix it. Leaving this issue open until we've had a chance to reload the production wikis, though.

Here's the new output from that inspection scriptlet:

100andchangeExport-all-judges.csv:
Row count: 1904

sanitized-100andchangeExport-all-judges.csv:
Row count: 1904

filtered-100andchangeExport-all-judges.csv:
Row count: 1885

tmp-filtered-100andchangeExport-all-judges.csv:
Row count: 1885

joined-100andchangeExport-all-judges.csv:
Row count: 1885

processed-100andchangeExport-all-judges.csv:
Row count: 1885

Since there are exactly 19 excluded proposals (see ${DATA_DIR}/excluded-review-numbers.txt), these numbers work out perfectly. So far, manual inspection also checks out too. For example, proposal #8017 isn't mentioned in any of the supplemental data, but it still appears in the final spreadsheet, and has, as expected, two empty fields at the end, Participant_Email and Reason_For_Turndown, right after the Pitch_Video_Link.