adriennekline / psmpy

propensity score matching in python
Other
50 stars 2 forks source link

Float division by zero in psm.logistic_ps #1

Closed Luis-vllgh closed 1 year ago

Luis-vllgh commented 1 year ago

Hi there!

I am currently using PsmPy on a rather large dataset and I got the following error in the psm.logistic_ps command:

image

This seems to occur during the calculation of the propensity_logit column. I dont think that one or many of the propensity scores are acutally equal to 1, but maybe the propensity score from the package is rounded at some stage of the algorithm? If I choose a slightly smaller dataset it runs and the highest propensity score ends up to be "0.9999986309511635".

Do you have an idea how to fix this? Would help me out a lot! Thanks in advance

adriennekline commented 1 year ago

Hi Luis,

Which version are you using?

Thanks, Adrienne

From: Luis Vollerigh @.> Date: Monday, December 5, 2022 at 09:08 To: adriennekline/psmpy @.> Cc: Subscribed @.***> Subject: [adriennekline/psmpy] Float division by zero in psm.logistic_ps (Issue #1)

Hi there!

I am currently using PsmPy on a rather large dataset and I got the following error in the psm.logistic_ps command:

[image]https://user-images.githubusercontent.com/112629870/205668554-f9953871-9415-437a-8e93-9489fb360613.png

This seems to occur during the calculation of the propensity_logit column. I dont think that one or many of the propensity scores are acutally equal to 1, but maybe the propensity score from the package is rounded at some stage of the algorithm? If I choose a slightly smaller dataset it runs and the highest propensity score ends up to be "0.9999986309511635".

Do you have an idea how to fix this? Would help me out a lot! Thanks in advance

— Reply to this email directly, view it on GitHubhttps://github.com/adriennekline/psmpy/issues/1, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AI7D64UJGAFXYCJHZ3TYYOTWLYAOHANCNFSM6AAAAAASUMJROA. You are receiving this because you are subscribed to this thread.Message ID: @.***>

Luis-vllgh commented 1 year ago

Hi,

I am using PsmPy 0.3.6.

Best regards, Luis

adriennekline commented 1 year ago

I see the problem – I’ll have to make an update to the package. Will do so tonight and upload a new version and that should work just fine. Out of curiosity, how big is your dataset?

Adrienne

From: Luis Vollerigh @.> Date: Monday, December 5, 2022 at 09:15 To: adriennekline/psmpy @.> Cc: adriennekline @.>, Comment @.> Subject: Re: [adriennekline/psmpy] Float division by zero in psm.logistic_ps (Issue #1)

Hi,

I am using PsmPy 0.3.6.

Best regards, Luis

— Reply to this email directly, view it on GitHubhttps://github.com/adriennekline/psmpy/issues/1#issuecomment-1337555088, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AI7D64WVOYD7EHUBT72Z26DWLYBJTANCNFSM6AAAAAASUMJROA. You are receiving this because you commented.Message ID: @.***>

Luis-vllgh commented 1 year ago

Thanks for your fast response! That is great, I am looking forward to the update.

Right now I get the error at 750.000 rows, but that may be extended to >20 mio. rows

adriennekline commented 1 year ago

Ok. As you can probably guess, propensity score matching relies on a KNN. And if you have > 20 million rows this will likely become incomputable (in the second step i.e. matching).

On Mon, Dec 5, 2022 at 9:23 AM Luis Vollerigh @.***> wrote:

Thanks for your fast response! That is great, I am looking forward to the update.

Right now I get the error at 750.000 rows, but that may be extended to >20 mio. rows

— Reply to this email directly, view it on GitHub https://github.com/adriennekline/psmpy/issues/1#issuecomment-1337568262, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI7D64XN4QJSJCQIZ2FKFNTWLYCG5ANCNFSM6AAAAAASUMJROA . You are receiving this because you commented.Message ID: @.***>

Luis-vllgh commented 1 year ago

Thanks for the advice! I already have an idea how to work around this problem. Will see if it works..

adriennekline commented 1 year ago

Great! I can implement it in the package if you find it successful - so let me know :)

On Mon, Dec 5, 2022 at 9:31 AM Luis Vollerigh @.***> wrote:

Thanks for the advice! I already have an idea how to work around this problem. Will see if it works..

— Reply to this email directly, view it on GitHub https://github.com/adriennekline/psmpy/issues/1#issuecomment-1337581320, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI7D64R6GEL6H5VS4QSVN7LWLYDFLANCNFSM6AAAAAASUMJROA . You are receiving this because you commented.Message ID: @.***>

Luis-vllgh commented 1 year ago

I'll let you know! :)

adriennekline commented 1 year ago

Great! :)

I've uploaded a new version: 3.8! Please let me know if this resolves the issue!

Thanks, Adrienne

On Tue, Dec 6, 2022 at 2:23 AM Luis Vollerigh @.***> wrote:

I'll let you know! :)

— Reply to this email directly, view it on GitHub https://github.com/adriennekline/psmpy/issues/1#issuecomment-1338950814, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI7D64X5SXGD3VYVVACNHC3WL3ZXTANCNFSM6AAAAAASUMJROA . You are receiving this because you commented.Message ID: @.***>

Luis-vllgh commented 1 year ago

Hi Adrienne, it works perfectly fine! Thanks for the update to the package :)