djsutherland / pummeler

Utilities to analyze ACS PUMS files, especially for distribution regression / ecological inference
MIT License
21 stars 7 forks source link

sort on 2015 data #17

Closed flaxter closed 7 years ago

flaxter commented 7 years ago

Something going on with 2015 data:


$ ./pummel sort -z pums_2015.zip sorted_15 --version 2015 
File ss15pusa.csv
/ 0 Elapsed Time: 0:00:00                                                                                                                      Traceback (most recent call last):
  File "./pummel", line 5, in <module>
    main()
  File "pummeler-head/pummeler/cli.py", line 112, in main
    args.func(args, parser)
  File "pummeler-head/pummeler/cli.py", line 122, in do_sort
    adj_inc=True, version=args.version, chunksize=args.chunksize)
  File "pummeler-head/pummeler/sort.py", line 63, in sort_by_region
    version=version):
  File "pummeler-head/pummeler/reader.py", line 19, in read_chunks
    for chunk in chunks:
  File "/homes/flaxman/.local/lib/python2.7/site-packages/pandas/io/common.py", line 113, in <lambda>
    BaseIterator.next = lambda self: self.__next__()
  File "/homes/flaxman/.local/lib/python2.7/site-packages/pandas/io/parsers.py", line 915, in __next__
    return self.get_chunk()
  File "/homes/flaxman/.local/lib/python2.7/site-packages/pandas/io/parsers.py", line 971, in get_chunk
    return self.read(nrows=size)
  File "/homes/flaxman/.local/lib/python2.7/site-packages/pandas/io/parsers.py", line 938, in read
    ret = self._engine.read(nrows)
  File "/homes/flaxman/.local/lib/python2.7/site-packages/pandas/io/parsers.py", line 1507, in read
    data = self._reader.read(nrows)
  File "pandas/parser.pyx", line 846, in pandas.parser.TextReader.read (pandas/parser.c:10364)
  File "pandas/parser.pyx", line 880, in pandas.parser.TextReader._read_low_memory (pandas/parser.c:10845)
  File "pandas/parser.pyx", line 922, in pandas.parser.TextReader._read_rows (pandas/parser.c:11386)
  File "pandas/parser.pyx", line 909, in pandas.parser.TextReader._tokenize_rows (pandas/parser.c:11257)
  File "pandas/parser.pyx", line 2018, in pandas.parser.raise_parser_error (pandas/parser.c:26979)
pandas.io.common.CParserError: Error tokenizing data. C error: Expected 284 fields in line 59022, saw 318
flaxter commented 7 years ago

data was corrupt!