Closed danielhuppmann closed 3 years ago
Merging #562 (4407f85) into main (666da23) will increase coverage by
0.0%
. The diff coverage is100.0%
.
@@ Coverage Diff @@
## main #562 +/- ##
=====================================
Coverage 93.7% 93.7%
=====================================
Files 50 50
Lines 5322 5332 +10
=====================================
+ Hits 4987 4997 +10
Misses 335 335
Impacted Files | Coverage Δ | |
---|---|---|
pyam/core.py | 94.3% <100.0%> (+<0.1%) |
:arrow_up: |
tests/test_core.py | 100.0% <100.0%> (ø) |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 666da23...4407f85. Read the comment docs.
While working on this PR, I noticed that pyam also has a getter df[column]
, which returns the same as the new function df.get_data_column(column)
(if column
is a dimension of data
).
I see three options:
data
or meta
is a bit confusingget_data_column()
Any thoughts @gidden or anyone else?
For reference, the implementation is reproduced below:
def __getitem__(self, key):
_key_check = [key] if isstr(key) else key
if set(_key_check).issubset(self.meta.columns):
return self.meta.__getitem__(key)
else:
return self.data.__getitem__(key)
In which case if get_data_column()
is indeed faster than self.data.__getitem__(key)
and is guaranteed to have the same outcome in all situations, then I would simply replace the last line here.
I would advise against deprecation, as this is a pretty innocuous feature and we have no idea how many users are actually using it. Not to mention, it would be almost impossible to update legacy code...
I would advise against deprecation, as this is a pretty innocuous feature and we have no idea how many users are actually using it. Not to mention, it would be almost impossible to update legacy code...
Don't quite agree that it's so innocuous... If you do df["region"]
, it's intuitive what you get (the column from df.data
), if you do df["category"]
, it's also intuitive (the column from df.meta
, if it exists). But what do you get if you do df["model"]
?
But if you think a DeprecationWarning is too strong, how about a FutureWarning - so this will give users probably a year's worth of time to update any code...?
I would strongly prefer not removing/deprecating. Have we heard anyone who has raised an issue with this?
Ok, let's leave it - wil simply implement the more efficient approach.
Great, thank you!
Implemented the more efficient approach for the direct getter IamDataFrame[<column>]
and extended the corresponding unit test for good measure...
Looks great, thanks!
Please confirm that this PR has done the following:
Description of PR
This PR adds a utility function
get_data_column(name)
as short-hand (and more efficient implementation) fordf.data[name]
, because it avoids casting the internal_data
pd.Series to a pd.DataFrame.This utility function is then used to make filtering by the time domain more performant.