Closed donboyd5 closed 1 week ago
@nikhilwoodruff, All the remaining warnings are in code you wrote. What is your timeline for eliminating these warnings?
After the merge of PR #178, we have these warnings when activating the-usually-skipped test_create_file
test:
============================ warnings summary ============================
tests/test_create_tmd_variables.py::test_create_file
.../site-packages/policyengine_core/enums/enum.py: 56:
FutureWarning: Series.__getitem__
treating keys as positions is deprecated.
In a future version, integer keys will always be treated as labels (consistent with
DataFrame behavior). To access a value by position, use ser.iloc[pos]
if isinstance(array[0], Enum):
tests/test_create_tmd_variables.py: 458 warnings
.../tmd/utils/reweight.py: 176:
PerformanceWarning: DataFrame is highly fragmented.
This is usually the result of calling frame.insert
many times, which has poor performance.
Consider joining all columns at once using pd.concat(axis=1)
instead.
To get a de-fragmented frame, use newframe = frame.copy()
loss_matrix[label] = mask * values
The first warning was reported in a policyengine-core
issue some time ago, but it has not yet been fixed.
The second warning is from the tmd/utils/reweight.py
module that uses a complex process of building the loss_matrix
. This second warning is saying the complex code generates a "highly fragmented" data structure that has "poor performance".
Note that PR #180 follows the warning suggestion on how to defragment the loss_matrix
, but the warnings are still generated.
All the other warning originally reported in this issue have been fixed by code improvements merged during August 2024.
Thank you @Martin Holmer @.***>
On Sun, Sep 1, 2024 at 3:17 PM Martin Holmer @.***> wrote:
Closed #135 https://github.com/PSLmodels/tax-microdata-benchmarking/issues/135 as completed via #180 https://github.com/PSLmodels/tax-microdata-benchmarking/pull/180.
— Reply to this email directly, view it on GitHub https://github.com/PSLmodels/tax-microdata-benchmarking/issues/135#event-14095451304, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABR4JGABQY55DZR52ANRLEDZUNR4DAVCNFSM6AAAAABKWX6YJGVHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJUGA4TKNBVGEZTANA . You are receiving this because you authored the thread.Message ID: <PSLmodels/tax-microdata-benchmarking/issue/135/issue_event/14095451304@ github.com>
FYI. On clean install of tax-microdata-benchmarking as of PR 134:
I get several pandas and numpy warnings. I am not sure if new code, or older code, is triggering these warnings, but I don't recall seeing them before:
frame.insert
many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead.interpolation=
argument to percentile was renamed tomethod=
, which has additional options. Users of the modes 'nearest', 'lower', 'higher', or 'midpoint' are encouraged to review the method they used. (Deprecated NumPy 1.22)Examples below: