Closed MaxGhenis closed 4 months ago
Will work on this, but need to chat with Nikhil, as when I try running reforms in Python notebooks, I receive other errors and am thus unable to proceed
Have you tried Colab? e.g. the notebook linked in https://github.com/PolicyEngine/policyengine-us/issues/3634
Unfortunately, yes. I ran a policy reform within the app (increased the rate on IRS tax bracket 3) and received an arcane syntax error. I then ran it with a different one, but I'm forgetting the outcome. Let me try again with the notebook you linked, perhaps I made some sort of mistake?
Even after integrating the use_reported_state_income_tax
setting into the notebook, I receive an error indicating that:
TypeError: numpy boolean subtract, the `-` operator, is not supported, use the bitwise_xor, the `^` operator, or the logical_xor function instead.
The full notebook is available here. Is there a bug in another portion of the Reproduce in Python
code that I'm missing? Or am I just bad at setting up Colab notebooks?
I'd suggest testing with calc
rather than calculate_dataframe
, which is finicky in several regards. See #653
Could definitely do that; that said, what I have in the Colab notebook is what we output currently in the app. Should #653 be reopened to change that verbiage?
We should remove in_poverty
from the snippet or remove the last line
Upon further testing - if we merely remove in_poverty
and maintain the dataframes, everything works properly. However, with the simple calculate
, I actually receive the following error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
[<ipython-input-6-4c6a3821c8c3>](https://localhost:8080/#) in <cell line: 26>()
24 reformed = Microsimulation(reform=reform)
25 HOUSEHOLD_VARIABLES = ["person_id", "household_id", "age", "household_net_income", "household_income_decile", "household_tax", "household_benefits"]
---> 26 baseline_person_df = baseline.calculate(HOUSEHOLD_VARIABLES, 2024)
27 reformed_person_df = reformed.calculate(HOUSEHOLD_VARIABLES, 2024)
28 difference_person_df = reformed_person_df - baseline_person_df
1 frames
[/usr/local/lib/python3.10/dist-packages/policyengine_core/simulations/simulation.py](https://localhost:8080/#) in calculate(self, variable_name, period, map_to, decode_enums)
371 )
372
--> 373 np.random.seed(hash(variable_name + str(period)) % 1000000)
374
375 try:
TypeError: can only concatenate list (not "str") to list
It seems to me that switching over to calculate
would require a bit more reworking, so if you're looking for a quick turnaround that ensures this code works properly, I could merely remove in_poverty
for all contexts and add the use_reported_state_income_tax
for US-wide simulations, then we could keep #653 open and work on that separately.
calculate
/calc
only takes a single variable at a time
As I'm looking through this, does it make sense to replace calculate_dataframe
with calculate
, since we'd have to place the resulting series of series within some sort of container to allow for an extra dimension of data? I noticed that Nikhil also seems to have objected to using the calculate
function in the original #653.
We encourage analysts to use calc
. @nikhilwoodruff if you stand by that comment could you share your justification? Microdataframes have caused several bugs in usage and most of the analysis I've seen uses series.
The main reason I ask is only because it doesn't appear to be possible to use calc
to calculate all of the output variables we currently include over the data series we include without separately running a calculation for each output variable. Would you like that written into the code?
Alternatively, I worked up a Colab notebook that removes all of the output variables except household net income with the CTC as an example policy - would that be the preferred direction to move in?
Let's prepopulate with the code from your notebook
Reopening due to #1274.
Reproduce in Python
does not currently reproduce the US microsimulation results in the web app, because we don't set thesimulation.use_reported_state_income_tax
parameter toTrue
.