facebookresearch / balance

The balance python package offers a simple workflow and methods for dealing with biased data samples when looking to infer from them to some target population of interest.
https://import-balance.org
GNU General Public License v2.0
686 stars 42 forks source link

Address deprecated pandas syntax #46

Closed ahakso closed 1 year ago

ahakso commented 1 year ago

This PR changes a couple minor deprecated pandas syntax choices.

Passing a set is deprecated. This PR converts the set to a list before indexing with it. This key/value pair in the response doesn't seem to be used elsewhere, but I left it as a set for Chesterton's fence reasons.

Since this warning is surfaced with the first code snippet on the README, it's nice to avoid this warning

talgalili commented 1 year ago

Thanks! I'll take a look at it later today after the tests finish running.

ahakso commented 1 year ago

Ah; so the attempt to assign the column with an attribute actually results in a no-op. When correcting the assignment, the target population gains a new column that is response variable, happiness. This breaks a test.

Since we don't expect to have the response variable in the target population, my recommendation is just to remove the attempted assignment of the happiness column...

I removed the assignment line, and tests are passing now. Alternatively, I could replace it and update the test to expect the response variable in the target

talgalili commented 1 year ago

I'll take a look within an hour once I'm by a computer. Thanks!

talgalili commented 1 year ago

Hey @ahakso I took a look, thanks for the PR!

Both changes seems good to me, but:

  1. The tests seem to fail - could you please take a look at it?
  2. Before I import it, could you please just update the CHANGELOG file with your fixes? (you can add a "Bug Fixes" section to 0.9.0 release). https://github.com/facebookresearch/balance/blob/main/CHANGELOG.md
ahakso commented 1 year ago

Great - yes, tests pass locally, but not in the workflow. I'll see if I can figure out why the random draws differ and update the changelog; thanks

talgalili commented 1 year ago

@ahakso thinking about it - I have a guess on why this would fail. The random seed is set at the beginning, and every call to another function that creates random values will (conceptually) move the random seed for the next function. So when removing a line that creates random numbers, the following calls will produce different random numbers - which will lead the tests to fail.

I think the easiest solution would be to get the target_df.happiness back, and update the tests.

WDYT?

ahakso commented 1 year ago

Yep, had just come to come to the same realization myself... My local tests were passing because the imports in pytest are importing from the pip-installed balance 🤦

That's a reasonable option, but I like that the target population df is currently missing the measured happiness value that the sample is trying to estimate for the population. And if we add happiness to it, we'll need to add the resulting hardcoded values to the test in any case. Any objection to me just changing the hardcoded expected test values downstream of the removed random number generation?

talgalili commented 1 year ago

The original point of having happiness in the target was to see what the real population value is. It's not a realistic case, of course, but it allows to discuss how close the adjustment is getting us to the "real" value.

WDYT?

On Tue, 9 May 2023, 22:43 ahakso, @.***> wrote:

Yep, had just come to come to the same realization myself... My local tests were passing because the imports in pytest are importing from the pip-installed balance 🤦

That's a reasonable option, but I like that the target population df is currently missing the measured happiness value that the sample is trying to estimate for the population. And if we add happiness to it, we'll need to add the resulting hardcoded values to the test in any case. Any objection to me just changing the hardcoded expected test values downstream of the removed random number generation?

— Reply to this email directly, view it on GitHub https://github.com/facebookresearch/balance/pull/46#issuecomment-1540794543, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHOJBVWJJ5YNBMWEJP23SLXFKM7NANCNFSM6AAAAAAX3SFUFA . You are receiving this because you commented.Message ID: @.***>

ahakso commented 1 year ago

That seems reasonable. I'll go ahead & do that.

talgalili commented 1 year ago

Thanks! I'll run the tests and if they're green I'll get the PR merged during the day tomorrow (I'll need a sign off by another reviewer before it's merged. It should take less than a day)

facebook-github-bot commented 1 year ago

@talgalili has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

talgalili commented 1 year ago

I fixed some linting warnings, and fixed the CHANGELOG a bit. The PR should land within the next 30 minutes.

facebook-github-bot commented 1 year ago

@talgalili merged this pull request in facebookresearch/balance@7a117a3009db62319e1db7f87363ad7c43368963.

talgalili commented 1 year ago

PR merged, thanks for your contribution.

talgalili commented 1 year ago

BTW, a followup PR to this would be to update the tutorials a bit so happiness is defined as a target for the target population.

Just a thought.

ahakso commented 1 year ago

True, I'll do that if I get the chance. There are also a fairly significant number of additional deprecating operations here that would be nice (but less visible) to address.

This was my first open source contribution, tiny as it was. Thanks for making it a perfectly pleasant experience!

talgalili commented 1 year ago

My pleasure!

I'd be happy to support whichever other contribution you'd be interested in making. The package still has many legacy issues to resolve which we have not yet gotten to address, so any help would be appreciated.

(P.s.: I've added a "thank you" to you in the CHANGELOG)

On Wed, 10 May 2023, 20:01 ahakso, @.***> wrote:

True, I'll do that if I get the chance. There are also a fairly significant number of additional deprecating operations here that would be nice (but less visible) to address.

This was my first open source contribution, tiny as it was. Thanks for making it a perfectly pleasant experience!

— Reply to this email directly, view it on GitHub https://github.com/facebookresearch/balance/pull/46#issuecomment-1542532797, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHOJBRES2KLQBV3I2AI3PDXFPCUZANCNFSM6AAAAAAX3SFUFA . You are receiving this because you were mentioned.Message ID: @.***>