PSLmodels / tax-microdata-benchmarking

A project to develop a benchmarked general-purpose dataset for tax reform impact analysis.
https://pslmodels.github.io/tax-microdata-benchmarking/
2 stars 6 forks source link

FYI: Pandas and numpy warnings on final 3 tests of clean install of tax-microdata-benchmarking as of PR 134 #135

Closed donboyd5 closed 1 week ago

donboyd5 commented 2 months ago

FYI. On clean install of tax-microdata-benchmarking as of PR 134:

image

I get several pandas and numpy warnings. I am not sure if new code, or older code, is triggering these warnings, but I don't recall seeing them before:

Examples below:

image

image

image

image

martinholmer commented 1 month ago

@nikhilwoodruff, All the remaining warnings are in code you wrote. What is your timeline for eliminating these warnings?

martinholmer commented 1 week ago

After the merge of PR #178, we have these warnings when activating the-usually-skipped test_create_file test:

============================ warnings summary ============================ tests/test_create_tmd_variables.py::test_create_file .../site-packages/policyengine_core/enums/enum.py: 56: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use ser.iloc[pos] if isinstance(array[0], Enum):

tests/test_create_tmd_variables.py: 458 warnings .../tmd/utils/reweight.py: 176: PerformanceWarning: DataFrame is highly fragmented.
This is usually the result of calling frame.insert many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy() loss_matrix[label] = mask * values

The first warning was reported in a policyengine-core issue some time ago, but it has not yet been fixed.

The second warning is from the tmd/utils/reweight.py module that uses a complex process of building the loss_matrix. This second warning is saying the complex code generates a "highly fragmented" data structure that has "poor performance".

Note that PR #180 follows the warning suggestion on how to defragment the loss_matrix, but the warnings are still generated.

All the other warning originally reported in this issue have been fixed by code improvements merged during August 2024.

donboyd5 commented 1 week ago

Thank you @Martin Holmer @.***>

On Sun, Sep 1, 2024 at 3:17 PM Martin Holmer @.***> wrote:

Closed #135 https://github.com/PSLmodels/tax-microdata-benchmarking/issues/135 as completed via #180 https://github.com/PSLmodels/tax-microdata-benchmarking/pull/180.

— Reply to this email directly, view it on GitHub https://github.com/PSLmodels/tax-microdata-benchmarking/issues/135#event-14095451304, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABR4JGABQY55DZR52ANRLEDZUNR4DAVCNFSM6AAAAABKWX6YJGVHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJUGA4TKNBVGEZTANA . You are receiving this because you authored the thread.Message ID: <PSLmodels/tax-microdata-benchmarking/issue/135/issue_event/14095451304@ github.com>