apfeuti / covid19-rest

Provides a REST-API to get data about COVID19 cases. Data sources are openZH (for detailed figures about Switzerland) and Johns Hopkins University.
MIT License
11 stars 3 forks source link

csv output seems to be incompatible with pandas python #1

Closed nocluebutalotofit closed 4 years ago

nocluebutalotofit commented 4 years ago

import pandas as pd data = pd.read_csv('https://covid19-rest.herokuapp.com/api/openzh/v1/all?output=csv')

In order to easily reproduce it you can copy these two lines into a https://colab.research.google.com/ notebook. The line number where the ParserError occurs changes as the data gets updated.

results in:

ParserError Traceback (most recent call last)

in ----> 1 data = pd.read_csv('https://covid19-rest.herokuapp.com/api/openzh/v1/all?output=csv') ~/opt/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision) 674 ) 675 --> 676 return _read(filepath_or_buffer, kwds) 677 678 parser_f.__name__ = name ~/opt/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds) 452 453 try: --> 454 data = parser.read(nrows) 455 finally: 456 parser.close() ~/opt/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py in read(self, nrows) 1131 def read(self, nrows=None): 1132 nrows = _validate_integer("nrows", nrows) -> 1133 ret = self._engine.read(nrows) 1134 1135 # May alter columns / col_dict ~/opt/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py in read(self, nrows) 2035 def read(self, nrows=None): 2036 try: -> 2037 data = self._reader.read(nrows) 2038 except StopIteration: 2039 if self._first_chunk: pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.read() pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_low_memory() pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_rows() pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._tokenize_rows() pandas/_libs/parsers.pyx in pandas._libs.parsers.raise_parser_error() ParserError: Error tokenizing data. C error: Expected 18 fields in line 333, saw 19
apfeuti commented 4 years ago

Thanks, I found the bug. Bugfix is coming later.

apfeuti commented 4 years ago

Bugfix deployed: use double-quotes around fields to be save in case with comma in payload-data (especially in source-field)

Same Python code from above is workig now.

nocluebutalotofit commented 4 years ago

Thank you very much for the quick fix!