UMEP-dev / SuPy

SUEWS that speaks Python
https://supy.readthedocs.io/
GNU General Public License v3.0
13 stars 7 forks source link

Error invalid continuation byte when loading input files #35

Closed sunt05 closed 2 years ago

sunt05 commented 3 years ago

issue originally submitted here: https://github.com/Urban-Meteorology-Reading/Urban-Meteorology-Reading.github.io/issues/12#issue-972581391 by @MatthewPaskin

Describe the Issue Hi, I am trying to do a SUEWS run using supy and when using the input files from the 2020a download, upon running df_state_init = sp.init_supy(path_runcontrol)

I get this error:

UnicodeDecodeError                        
Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_24724/777218402.py in <module>
----> 1 df_state_init = sp.init_supy(path_runcontrol)

~\Anaconda3\envs\my_dev_env_supy\lib\site-packages\supy\_supy_module.py in init_supy(path_init, force_reload, check_input)
    101         if path_init_x.suffix == ".nml":
    102             # SUEWS `RunControl.nml`:
--> 103             df_state_init = load_InitialCond_grid_df(
    104                 path_init_x, force_reload=force_reload
    105             )

~\Anaconda3\envs\my_dev_env_supy\lib\site-packages\supy\_load.py in load_InitialCond_grid_df(path_runcontrol, force_reload)
   1588     # load base df of InitialCond
   1589     logger_supy.debug("loading base df_init...")
-> 1590     df_init = load_SUEWS_InitialCond_df(path_runcontrol)
   1591 
   1592     # add Initial Condition variables from namelist file

~\Anaconda3\envs\my_dev_env_supy\lib\site-packages\supy\_load.py in load_SUEWS_InitialCond_df(path_runcontrol)
   1325     path_input = path_runcontrol.parent / dict_ModConfig["fileinputpath"]
   1326     logger_supy.debug("loading df_gridSurfaceChar")
-> 1327     df_gridSurfaceChar = load_SUEWS_SurfaceChar_df(path_input)
   1328     # df_gridSurfaceChar.to_pickle("df_gridSurfaceChar.pkl")
   1329     # only use the first year of each grid as base for initial conditions

~\Anaconda3\envs\my_dev_env_supy\lib\site-packages\supy\_load.py in load_SUEWS_SurfaceChar_df(path_input)
   1098 @functools.lru_cache(maxsize=16)
   1099 def load_SUEWS_SurfaceChar_df(path_input):
-> 1100     df_gridSurfaceChar_exp = gen_df_gridSurfaceChar_exp(path_input)
   1101     dict_var_ndim = {
   1102         "ahprof_24hr": (24, 2),

~\Anaconda3\envs\my_dev_env_supy\lib\site-packages\supy\_load.py in gen_df_gridSurfaceChar_exp(path_input)
   1082 @functools.lru_cache(maxsize=16)
   1083 def gen_df_gridSurfaceChar_exp(path_input):
-> 1084     df_siteselect_exp = gen_df_siteselect_exp(path_input)
   1085     dict_var_tuple = exp_dict_full(dict_var2SiteSelect)
   1086     df_gridSurfaceChar_exp = pd.concat(

~\Anaconda3\envs\my_dev_env_supy\lib\site-packages\supy\_load.py in gen_df_siteselect_exp(path_input)
    956 def gen_df_siteselect_exp(path_input):
    957     # df with all code-references values
--> 958     df_all_code = gen_all_code_df(path_input)
    959 
    960     # retrieve all `Code`-relaed names

~\Anaconda3\envs\my_dev_env_supy\lib\site-packages\supy\_load.py in gen_all_code_df(path_input)
    875 @functools.lru_cache(maxsize=16)
    876 def gen_all_code_df(path_input):
--> 877     dict_libs = load_SUEWS_Libs(path_input)
    878     df_siteselect = dict_libs["lib_SiteSelect"]
    879     list_code = [code for code in df_siteselect.columns if to_exp_Q(code)]

~\Anaconda3\envs\my_dev_env_supy\lib\site-packages\supy\_load.py in load_SUEWS_Libs(path_input)
    268         # lib_path = os.path.join(path_input, lib_file)
    269         lib_path = path_input / lib_file
--> 270         dict_libs.update({lib: load_SUEWS_table(lib_path)})
    271     # return DataFrame containing settings
    272     return dict_libs

~\Anaconda3\envs\my_dev_env_supy\lib\site-packages\supy\_load.py in load_SUEWS_table(path_file)
    247         # fileX = path_insensitive(fileX)
    248         str_file = str(path_file)
--> 249         rawdata = pd.read_csv(
    250             str_file,
    251             delim_whitespace=True,

~\Anaconda3\envs\my_dev_env_supy\lib\site-packages\pandas\util\_decorators.py in wrapper(*args, **kwargs)
    309                     stacklevel=stacklevel,
    310                 )
--> 311             return func(*args, **kwargs)
    312 
    313         return wrapper

~\Anaconda3\envs\my_dev_env_supy\lib\site-packages\pandas\io\parsers\readers.py in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)
    584     kwds.update(kwds_defaults)
    585 
--> 586     return _read(filepath_or_buffer, kwds)
    587 
    588 

~\Anaconda3\envs\my_dev_env_supy\lib\site-packages\pandas\io\parsers\readers.py in _read(filepath_or_buffer, kwds)
    480 
    481     # Create the parser.
--> 482     parser = TextFileReader(filepath_or_buffer, **kwds)
    483 
    484     if chunksize or iterator:

~\Anaconda3\envs\my_dev_env_supy\lib\site-packages\pandas\io\parsers\readers.py in __init__(self, f, engine, **kwds)
    809             self.options["has_index_names"] = kwds["has_index_names"]
    810 
--> 811         self._engine = self._make_engine(self.engine)
    812 
    813     def close(self):

~\Anaconda3\envs\my_dev_env_supy\lib\site-packages\pandas\io\parsers\readers.py in _make_engine(self, engine)
   1038             )
   1039         # error: Too many arguments for "ParserBase"
-> 1040         return mapping[engine](self.f, **self.options)  # type: ignore[call-arg]
   1041 
   1042     def _failover_to_python(self):

~\Anaconda3\envs\my_dev_env_supy\lib\site-packages\pandas\io\parsers\c_parser_wrapper.py in __init__(self, src, **kwds)
     67         kwds["dtype"] = ensure_dtype_objs(kwds.get("dtype", None))
     68         try:
---> 69             self._reader = parsers.TextReader(self.handles.handle, **kwds)
     70         except Exception:
     71             self.handles.close()

~\Anaconda3\envs\my_dev_env_supy\lib\site-packages\pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader.__cinit__()

~\Anaconda3\envs\my_dev_env_supy\lib\site-packages\pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader._get_header()

~\Anaconda3\envs\my_dev_env_supy\lib\site-packages\pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader._tokenize_rows()

~\Anaconda3\envs\my_dev_env_supy\lib\site-packages\pandas\_libs\parsers.pyx in pandas._libs.parsers.raise_parser_error()

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe4 in position 1330: invalid continuation byte
sunt05 commented 3 years ago

@MatthewPaskin, more info for diagnostics are needed:

  1. version info: run sp.show_version() and report the output.
  2. input files: RunControl.nml and other SUEWS_xx.txt tables.
sunt05 commented 2 years ago

@MatthewPaskin please let me know if this issue still persists. If so, please provide more info as needed above; otherwise, please close this issue.

MatthewPaskin commented 2 years ago

@sunt05 Apologies this issue can now be closed. I do not believe I can do that myself as this is a copy of my original issue.

sunt05 commented 2 years ago

Ok, thanks for the update. I'll close this for now.

But feel free to reopen this issue if further help is needed.

suegrimmond commented 2 years ago

@sunt05 - he was an undergrad working with the group in the summer - so he is no long around

sunt05 commented 2 years ago

Thanks @suegrimmond.

I'm just cleaning up the issues and wanted to identify those for rapid fix.