CargoSense / ex_csv

CSV for Elixir
46 stars 8 forks source link

Breaks when enumerating into a Map #2

Closed peck closed 9 years ago

peck commented 9 years ago

I'm having trouble figuring out the right way to describe this, but it seems that something in the way the headings and body are stored/generated/etc breaks the Access protocol for Maps that they are enumerated into.

Trying to parse some Atlanta crime history data I noticed the problem trying to enumerate a struct where the "id" field was not being populated, below are the steps to reproduce. Unexpected result occurs at iex(11), with followups to demonstrate it doesn't happen when the headings and first body entry are manually copy/pasted in.

iex(8)> {:ok, table} = File.read!("/Users/peck/Downloads/atl_crime_min.csv") |> ExCsv.parse(headings: true) {:ok, %ExCsv.Table{body: [["11027661", "090010061", "01/01/2009", "01/01/2009", "00:10:00", "01/01/2009", "00:20:00", "309", "NULL", "NULL", "501 TUFTON TRL SE", "0511", "NULL", "NULL", "1", "Morn", "Thu", "20", "BURGLARY", "Glenrose Heights", "Z", "-84.3843", "33.66477"], ["11027662", "090010108", "01/01/2009", "12/31/2008", "20:00:00", "01/01/2009", "01:00:00", "506", "NULL", "NULL", "360 MARIETTA ST NW", "0640", "NULL", "NULL", "2", "Eve", "Wed", "18", "LARCENY", "Downtown", "M", "-84.39702", "33.76269"], ["11027663", "090010133", "01/01/2009", "12/31/2008", "22:00:00", "12/31/2008", "22:30:00", "114", "NULL", "NULL", "547 FAIRBURN RD. NW", "0710", "NULL", "NULL", "1", "Eve", "Wed", "NULL", "AUTO THEFT", "Fairburn Heights", "H", "-84.50049", "33.7709"], ["11027664", "090010143", "01/01/2009", "12/31/2008", "16:00:00", "01/01/2009", "00:45:00", "609", "NULL", "NULL", "695 VERNON AVE SE", "0511", "NULL", "50", "1", "Eve", "Wed", "20", "BURGLARY", "Ormewood Park", "W", "-84.35802", "33.73499"], ["11027665", "090010153", "01/01/2009", "12/31/2008", "22:00:00", "01/01/2009", "22:30:00", "403", "NULL", "NULL", "1137 AVON AVE SW", "0710", "NULL", "NULL", "1", "Unk", "Thu", "13", "AUTO THEFT", "Oakland City", "S", "-84.4249", "33.72235"], ["11027666", "090010159", "01/01/2009", "01/01/2009", "00:57:00", "01/01/2009", "01:05:00", "205", "NULL", "NULL", "BUCKEAD AVE./GRANDVIEW AVE.", "0311", "NULL", "50", "1", "Morn", "Thu", "13", "ROBBERY", "Buckhead Village", "B", "-84.37574", "33.83752"], ["11027667", "090010247", "01/01/2009", "12/31/2008", "23:00:00", "01/01/2009", "01:35:00", "607", "NULL", "NULL", "195 ARIZONA AVE NE", "0640", "NULL", "50", "2", "Morn", "Thu", "13", "LARCENY", "Edgewood", "O", "-84.3312", "33.75902"], ["11027668", "090010267", "01/01/2009", "12/31/2008", "22:30:00", "01/01/2009", "01:30:00", "607", "NULL", "NULL", "58 SAUNDERS ST NE", "0640", "NULL", "50", "1", "Morn", "Thu", "20", "LARCENY", "Kirkwood", "O", "-84.32465", "33.75441"], ["11027669", "090010269", "01/01/2009", "01/01/2009", "00:35:00", "01/01/2009", "00:45:00", "403", "NULL", "NULL", "1145 PRINCESS AVE SW", "0511", "NULL", "NULL", "1", "Morn", "Thu", "20", "BURGLARY", "Oakland City", "S", "-84.42549", "33.72374"], ["11027670", "090010307", "01/01/2009", "12/31/2008", "22:45:00", "01/01/2009", "01:45:00", "402", "NULL", "NULL", "1530 RALPH D ABERNATHY BLVD SW", "0511", "NULL", "NULL", "1", "Morn", "Thu", "26", "BURGLARY", "Westview", "T", "-84.43767", "33.74307"], ["11027671", "090010314", "01/01/2009", "12/31/2008", "22:00:00", "01/01/2009", "01:45:00", "403", "NULL", "NULL", "14 BELMONTE CT SW", "0720", "NULL", "NULL", "1", "Morn", "Wed", "18", "AUTO THEFT", "Cascade Avenue/Road", "S", "-84.44497", "33.72544"], ["11027672", "090010350", "01/01/2009", "01/01/2009", "02:20:00", "01/01/2009", "02:23:00", "307", "NULL", "NULL", "585 MCWILLIAMS RD SE", "0720", "NULL", "NULL", "1", "Morn", "Thu", "18", "AUTO THEFT", "Browns Mill Park", "Z", "-84.36881", "33.68869"], ["11027673", "090010358", "01/01/2009", "01/01/2009", "00:30:00", "01/01/2009", "02:00:00", "104", "NULL", "NULL", "2174 PENELOPE ST NW", "0720", "NULL", "50", "1", "Morn", "Thu", "18", "AUTO THEFT", "Penelope Neighbors", "J", "-84.45718", "33.75449"], ["11027674", "090010359", "01/01/2009", "12/31/2008", "22:30:00", "01/01/2009", "02:30:00", "508", "NULL", "NULL", "174 PEACHTREE ST.", "0710", "NULL", "NULL", "1", "Morn", "Thu", "18", "AUTO THEFT", "Downtown", "M", "-84.39383", "33.74954"], ["11027675", "090010365", "01/01/2009", "01/01/2009", "02:30:00", "01/01/2009", "02:35:00", "413", "NULL", "NULL", "3458 DELMAR LN NW", "0710", "NULL", "NULL", "1", "Morn", "Thu", "18", "AUTO THEFT", "Adamsville", "H", "-84.50064", "33.75686"], ["11027676", "090010369", "01/01/2009", "12/31/2008", "22:00:00", "01/01/2009", "02:00:00", "204", "NULL", "NULL", "3200 LENOX RD NE", "0511", "NULL", "NULL", "1", "Morn", "Thu", "26", "BURGLARY", "Pine Hills", "B", "-84.35822", "33.84221"], ["11027677", "090010373", "01/01/2009", "01/01/2009", "02:00:00", "01/01/2009", "02:30:00", "507", "NULL", "NULL", "115 W PEACHTREE PL NW", "0440", "NULL", "NULL", "1", "Morn", "Thu", "13", "AGG ASSAULT", "Downtown", "M", "-84.3911", "33.7641"], ["11027678", "090010400", "01/01/2009", "01/01/2009", "01:00:00", "01/01/2009", "01:30:00", "509", "NULL", "NULL", "54 PEACHTREE ST SW", "0410", "NULL", "NULL", "1", "Morn", "Thu", "NULL", "AGG ASSAULT", "Downtown", "M", "-84.3912", "33.75286"], ["11027679", "090010406", "01/01/2009", "12/31/2008", "07:00:00", "01/01/2009", "02:30:00", "311", "NULL", "NULL", "1083 ASTOR AVE SW", "0531", "NULL", "NULL", "1", "Unk", "Wed", "20", "BURGLARY", "Sylvan Hills", "X", "-84.42404", "33.70171"]], headings: ["id", "offense_id", "rpt_date", "occur_date", "occur_time", "poss_date", "poss_time", "beat", "apt_office_prefix", "apt_office_num", "location", "minofucr", "minofibr_code", "dispo_code", "maxofnum_victims", "shift", "avg_day", "loc_type", "uc2 literal", "neighborhood", "npu", "x", "y"], row_mapping: nil, row_struct: nil}} iex(9)> after_zip = table.headings |> Enum.zip(table.body |> hd) |> Enum.into %{} %{"apt_office_num" => "NULL", "apt_office_prefix" => "NULL", "avg_day" => "Thu", "beat" => "309", "dispo_code" => "NULL", "loc_type" => "20", "location" => "501 TUFTON TRL SE", "maxofnum_victims" => "1", "minofibr_code" => "NULL", "minofucr" => "0511", "neighborhood" => "Glenrose Heights", "npu" => "Z", "occur_date" => "01/01/2009", "occur_time" => "00:10:00", "offense_id" => "090010061", "poss_date" => "01/01/2009", "poss_time" => "00:20:00", "rpt_date" => "01/01/2009", "shift" => "Morn", "uc2 literal" => "BURGLARY", "x" => "-84.3843", "y" => "33.66477", "id" => "11027661"} iex(10)> Map.keys(after_zip) ["apt_office_num", "apt_office_prefix", "avg_day", "beat", "dispo_code", "loc_type", "location", "maxofnum_victims", "minofibr_code", "minofucr", "neighborhood", "npu", "occur_date", "occur_time", "offense_id", "poss_date", "poss_time", "rpt_date", "shift", "uc2 literal", "x", "y", "id"] iex(11)> after_zip["id"] nil iex(12)> after_zip["beat"] "309" iex(13)> table.headings ["id", "offense_id", "rpt_date", "occur_date", "occur_time", "poss_date", "poss_time", "beat", "apt_office_prefix", "apt_office_num", "location", "minofucr", "minofibr_code", "dispo_code", "maxofnum_victims", "shift", "avg_day", "loc_type", "uc2 literal", "neighborhood", "npu", "x", "y"] iex(14)> table.body |> hd ["11027661", "090010061", "01/01/2009", "01/01/2009", "00:10:00", "01/01/2009", "00:20:00", "309", "NULL", "NULL", "501 TUFTON TRL SE", "0511", "NULL", "NULL", "1", "Morn", "Thu", "20", "BURGLARY", "Glenrose Heights", "Z", "-84.3843", "33.66477"] iex(15)> copy_pasted_map = Enum.zip(["id", "offense_id", "rpt_date", "occur_date", "occur_time", "poss_date", ...(15)> "poss_time", "beat", "apt_office_prefix", "apt_office_num", "location", ...(15)> "minofucr", "minofibr_code", "dispo_code", "maxofnum_victims", "shift", ...(15)> "avg_day", "loc_type", "uc2 literal", "neighborhood", "npu", "x", "y"], ["11027661", "090010061", "01/01/2009", "01/01/2009", "00:10:00", "01/01/2009", ...(15)> "00:20:00", "309", "NULL", "NULL", "501 TUFTON TRL SE", "0511", "NULL", "NULL", ...(15)> "1", "Morn", "Thu", "20", "BURGLARY", "Glenrose Heights", "Z", "-84.3843", ...(15)> "33.66477"]) |> Enum.into %{} %{"apt_office_num" => "NULL", "apt_office_prefix" => "NULL", "avg_day" => "Thu", "beat" => "309", "dispo_code" => "NULL", "id" => "11027661", "loc_type" => "20", "location" => "501 TUFTON TRL SE", "maxofnum_victims" => "1", "minofibr_code" => "NULL", "minofucr" => "0511", "neighborhood" => "Glenrose Heights", "npu" => "Z", "occur_date" => "01/01/2009", "occur_time" => "00:10:00", "offense_id" => "090010061", "poss_date" => "01/01/2009", "poss_time" => "00:20:00", "rpt_date" => "01/01/2009", "shift" => "Morn", "uc2 literal" => "BURGLARY", "x" => "-84.3843", "y" => "33.66477"} iex(16)> Map.keys(copy_pasted_map) ["apt_office_num", "apt_office_prefix", "avg_day", "beat", "dispo_code", "id", "loc_type", "location", "maxofnum_victims", "minofibr_code", "minofucr", "neighborhood", "npu", "occur_date", "occur_time", "offense_id", "poss_date", "poss_time", "rpt_date", "shift", "uc2 literal", "x", "y"] iex(17)> copy_pasted_map["id"] "11027661"

atl_crime_min.csv: id,offense_id,rpt_date,occur_date,occur_time,poss_date,poss_time,beat,apt_office_prefix,apt_office_num,location,minofucr,minofibr_code,dispo_code,maxofnum_victims,shift,avg_day,loc_type,uc2 literal,neighborhood,npu,x,y 11027661,090010061,01/01/2009 ,01/01/2009 ,00:10:00 ,01/01/2009 ,00:20:00 ,309,NULL,NULL,501 TUFTON TRL SE,0511,NULL,NULL,1,Morn,Thu,20,BURGLARY,Glenrose Heights,Z,-84.3843,33.66477 11027662,090010108,01/01/2009 ,12/31/2008 ,20:00:00 ,01/01/2009 ,01:00:00 ,506,NULL,NULL,360 MARIETTA ST NW,0640,NULL,NULL,2,Eve,Wed,18,LARCENY,Downtown,M,-84.39702,33.76269 11027663,090010133,01/01/2009 ,12/31/2008 ,22:00:00 ,12/31/2008 ,22:30:00 ,114,NULL,NULL,547 FAIRBURN RD. NW,0710,NULL,NULL,1,Eve,Wed,NULL,AUTO THEFT,Fairburn Heights,H,-84.50049,33.7709 11027664,090010143,01/01/2009 ,12/31/2008 ,16:00:00 ,01/01/2009 ,00:45:00 ,609,NULL,NULL,695 VERNON AVE SE,0511,NULL,50,1,Eve,Wed,20,BURGLARY,Ormewood Park,W,-84.35802,33.73499 11027665,090010153,01/01/2009 ,12/31/2008 ,22:00:00 ,01/01/2009 ,22:30:00 ,403,NULL,NULL,1137 AVON AVE SW,0710,NULL,NULL,1,Unk,Thu,13,AUTO THEFT,Oakland City,S,-84.4249,33.72235 11027666,090010159,01/01/2009 ,01/01/2009 ,00:57:00 ,01/01/2009 ,01:05:00 ,205,NULL,NULL,BUCKEAD AVE./GRANDVIEW AVE.,0311,NULL,50,1,Morn,Thu,13,ROBBERY,Buckhead Village,B,-84.37574,33.83752 11027667,090010247,01/01/2009 ,12/31/2008 ,23:00:00 ,01/01/2009 ,01:35:00 ,607,NULL,NULL,195 ARIZONA AVE NE,0640,NULL,50,2,Morn,Thu,13,LARCENY,Edgewood,O,-84.3312,33.75902 11027668,090010267,01/01/2009 ,12/31/2008 ,22:30:00 ,01/01/2009 ,01:30:00 ,607,NULL,NULL,58 SAUNDERS ST NE,0640,NULL,50,1,Morn,Thu,20,LARCENY,Kirkwood,O,-84.32465,33.75441 11027669,090010269,01/01/2009 ,01/01/2009 ,00:35:00 ,01/01/2009 ,00:45:00 ,403,NULL,NULL,1145 PRINCESS AVE SW,0511,NULL,NULL,1,Morn,Thu,20,BURGLARY,Oakland City,S,-84.42549,33.72374 11027670,090010307,01/01/2009 ,12/31/2008 ,22:45:00 ,01/01/2009 ,01:45:00 ,402,NULL,NULL,1530 RALPH D ABERNATHY BLVD SW,0511,NULL,NULL,1,Morn,Thu,26,BURGLARY,Westview,T,-84.43767,33.74307 11027671,090010314,01/01/2009 ,12/31/2008 ,22:00:00 ,01/01/2009 ,01:45:00 ,403,NULL,NULL,14 BELMONTE CT SW,0720,NULL,NULL,1,Morn,Wed,18,AUTO THEFT,Cascade Avenue/Road,S,-84.44497,33.72544 11027672,090010350,01/01/2009 ,01/01/2009 ,02:20:00 ,01/01/2009 ,02:23:00 ,307,NULL,NULL,585 MCWILLIAMS RD SE,0720,NULL,NULL,1,Morn,Thu,18,AUTO THEFT,Browns Mill Park,Z,-84.36881,33.68869 11027673,090010358,01/01/2009 ,01/01/2009 ,00:30:00 ,01/01/2009 ,02:00:00 ,104,NULL,NULL,2174 PENELOPE ST NW,0720,NULL,50,1,Morn,Thu,18,AUTO THEFT,Penelope Neighbors,J,-84.45718,33.75449 11027674,090010359,01/01/2009 ,12/31/2008 ,22:30:00 ,01/01/2009 ,02:30:00 ,508,NULL,NULL,174 PEACHTREE ST.,0710,NULL,NULL,1,Morn,Thu,18,AUTO THEFT,Downtown,M,-84.39383,33.74954 11027675,090010365,01/01/2009 ,01/01/2009 ,02:30:00 ,01/01/2009 ,02:35:00 ,413,NULL,NULL,3458 DELMAR LN NW,0710,NULL,NULL,1,Morn,Thu,18,AUTO THEFT,Adamsville,H,-84.50064,33.75686 11027676,090010369,01/01/2009 ,12/31/2008 ,22:00:00 ,01/01/2009 ,02:00:00 ,204,NULL,NULL,3200 LENOX RD NE,0511,NULL,NULL,1,Morn,Thu,26,BURGLARY,Pine Hills,B,-84.35822,33.84221 11027677,090010373,01/01/2009 ,01/01/2009 ,02:00:00 ,01/01/2009 ,02:30:00 ,507,NULL,NULL,115 W PEACHTREE PL NW,0440,NULL,NULL,1,Morn,Thu,13,AGG ASSAULT,Downtown,M,-84.3911,33.7641 11027678,090010400,01/01/2009 ,01/01/2009 ,01:00:00 ,01/01/2009 ,01:30:00 ,509,NULL,NULL,54 PEACHTREE ST SW,0410,NULL,NULL,1,Morn,Thu,NULL,AGG ASSAULT,Downtown,M,-84.3912,33.75286 11027679,090010406,01/01/2009 ,12/31/2008 ,07:00:00 ,01/01/2009 ,02:30:00 ,311,NULL,NULL,1083 ASTOR AVE SW,0531,NULL,NULL,1,Unk,Wed,20,BURGLARY,Sylvan Hills,X,-84.42404,33.70171

peck commented 9 years ago

Found the problem, BOM marker at the begining of file, so the first item would never have a key since it was a 0 width non breaking space. Ugh, unicode!

benwilson512 commented 9 years ago

Does Map.get(map, "id") work? Or is that broken too. Also I'm a bit unclear at this point if this is an error with ExCsv or not sorry.

benwilson512 commented 9 years ago

Based on our conversation in IRC I'm going to close this, as this has more to do with how elixir handles UTF-8 stuff than ExCSV I believe. We obviously want to handle all UTF-8 stuff and as annoying at it is that's a valid character.