Closed gidden closed 4 years ago
cc @danielhuppmann @znicholls
Tricky one, I'm not sure. I've tried doing auto-filling using None
in OpenSCM and it hasn't been happy so that solution, whilst ideal, might be a bit hairy to make behave (pandas can be temperamental with None
and nan
values). The plan 'b' of filling with the column name seems like an ok fall back with plan 'c' just being to force users to fill in.
I agree that all required columns other than variable
can default to None
(not sure how I feel about variable=None
).
Need to check whether the „check for duplicates“ part at the end of format_data()
continues to work as expected.
Update following comment by @znicholls:
If pandas behaves weird with None
in columns, forcing users to provide names might be preferable.
One more thought about None
in columns: how do expect behaviour if we append an IamDataFrame
with model=None
to a “regular” frame? df.filter(model=None)
will not work (I think) and will also conflict with suggested changes in #207.
hmmm ok so maybe None
is a bad idea. nan
could work but it also creates plenty of havoc with pandas (and wouldn't work with the current drop_duplicate
call in format_data
).
This issue has been resolved in the sense that the constructor now takes keyword arguments with a default value for columns that are not in the input dataframe as suggested above:
y = pyam.IamDataFrame(df, value=['Population', 'GDP', 'Urbanization'], model='foo', region='bar', unit='baz')
During PR #199 we had a use case that became unsupported in the final implementation, notably filling in "missing" values in expected columns
For example, a dataframe looking like
At the moment raises an error:
At some point in the PR, default values would be filled in for these three columns (just with their column names) for ease of use. In many cases, I find that I don't actually care what these values are, and in fact just want the mountain of other nice
pyam
utilities to work with my data.So the question is: should we force users to fill in these, e.g.,
or should we do that for them with column names or some other value?