everypolitician / compare_with_wikidata

Library for diffing Wikidata and CSVs
MIT License
2 stars 0 forks source link

Error when comparing CSVs #48

Closed chrismytton closed 7 years ago

chrismytton commented 7 years ago

Using the two CSVs in this gist I have the following problem:

$ daff.rb query.csv morph.csv                  
/Users/chris/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/daff-1.3.19/lib/lib/coopy/table_diff.rb:399:in `setup_moves': undefined method `length' for -1:Integer (NoMethodError)                                                            
        from /Users/chris/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/daff-1.3.19/lib/lib/coopy/table_diff.rb:976:in `hilite_single'                                                                                                       
        from /Users/chris/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/daff-1.3.19/lib/lib/coopy/table_diff.rb:1079:in `hilite_with_nesting'                                                                                                
        from /Users/chris/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/daff-1.3.19/lib/lib/coopy/coopy.rb:279:in `run_diff'                                                                                                                 
        from /Users/chris/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/daff-1.3.19/lib/lib/coopy/coopy.rb:803:in `run'                                                                                                                      
        from /Users/chris/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/daff-1.3.19/lib/lib/coopy/coopy.rb:831:in `coopyhx'                                                                                                                  
        from /Users/chris/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/daff-1.3.19/lib/lib/coopy/coopy.rb:948:in `main'                                                                                                                     
        from /Users/chris/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/daff-1.3.19/bin/daff.rb:3:in `<top (required)>'                                                                                                                      
        from /Users/chris/.rbenv/versions/2.4.1/bin/daff.rb:22:in `load'                                               
        from /Users/chris/.rbenv/versions/2.4.1/bin/daff.rb:22:in `<main>'                                             

That comparison is done with the Ruby CLI version of daff, but we get the same problem when running in compare_with_wikidata.

I think this is to do with the ordering of the CSV files, but the command line daff that comes with the node version of daff works just fine with these two CSVs and produces this diff.

chrismytton commented 7 years ago

Those CSV files come from this prompt page: https://www.wikidata.org/wiki/User:Oravrattas/prompts/Seat_Count

chrismytton commented 7 years ago

A potential workaround for this would be to try and detect this error and then fallback to sorting both input CSVs by the first column. This might mean the results come back in an odd order, but hopefully #49 and #50 will help with that somewhat.

chrismytton commented 7 years ago

Have opened an issue upstream for this: https://github.com/paulfitz/daff/issues/97.

chrismytton commented 7 years ago

This has been fixed in https://github.com/everypolitician/compare_with_wikidata/commit/dd79ce6b201da2e94bc3d2890d236b649f079932.