IAMconsortium / pyam

Analysis & visualization of energy & climate scenarios
https://pyam-iamc.readthedocs.io/
Apache License 2.0
226 stars 118 forks source link

Unclear error reporting when reading broken files #588

Closed phackstock closed 2 years ago

phackstock commented 2 years ago

After a discussion with @guofei2016, we identified the following problem: Say we want to read an excel file where a cell in a year column contains a value which cannot be cast to a float. Right now we would get the following error:

import pyam
iam_df = pyam.IamDataFrame("minimum_failure_scenario.xlsx")
>>> ... ValueError: could not convert string to float: ''

I have attached the file minimum_failure_scenario.xlsx. It contains two models, and one time series per model. For the year 2005, the value for model 2 is " " (a single blank space).

Proposed solution

The error actually originates in pyam/utils.py line 324 where we call df["value"] = df["value"].astype("float64"). I would propose instead of pd.DataFrame.astype to use pd.to_numeric(pd.DataFrame). The difference is that the latter gives the row of the data frame in which the error originated. This should help the user is trying to locate the error. We could wrap this is an try except clause to further enhance the error and give the index combination (i.e. model, scenario, etc...). This might even be necessary as we are dealing with a long format table at this point.

Thoughts on this @danielhuppmann? If you think this is a reasonable solution I'd open a PR with my proposed fix.

danielhuppmann commented 2 years ago

Thanks for investigating @phackstock and for proposing the solution, looks good to me!

Note that there is already a utility function raise_data_error() to write a message (in this case maybe "Non-numeric data in a value column") plus the 'index' (model-scenario-...) where the error occurs.

phackstock commented 2 years ago

Ah perfect, thanks for the quick reply @danielhuppmann I'll see that I can use raise_data_error() in my proposed solution.