Closed jcombaz closed 5 months ago
Hi Jacques, thanks for bringing this up. I just thought it through, and I think I agree. Though shuffling one is probably sufficient. I have the feeling that maybe the multinomial sampler was implemented differently before at some point and then this error got introduced.
Just out of interest, do you have any statistical analysis that shows the before and after?
Hi!
Yes, I think one shuffing is enough but I haven't tested it.
I discovered this "statistical bug" when developing alternative approaches to Aina's analysis of carpooling. Here is an example of an analysis of commute trips based on the synthetic population, in which two trips can be carpooled iff their origins, destinations and departure times are close enough. The following graph shows the total number of "carpoolable" kilometers depending on the spatial constraint on origins and destinations (in meters). The green curve corresponds to the case where origins and destinations are not shuffled, and the purple curve when they are shuffled. The relative difference between these two curves can be significant for smaller values of the spatial parameter.
Regards,
Jacques.
-- Verimag - Bâtiment IMAG - Université Grenoble Alpes 150 place du torrent 38401 Saint Martin d’Hères @.*** www-verimag.imag.fr/~jcombaz phone: +33 (0)4 57 42 22 10 fax: +33 (0)4 57 42 22 22
De: "Sebastian Hörl" @.> À: "eqasim-org/ile-de-france" @.> Cc: "Jacques Combaz" @.>, "Author" @.> Envoyé: Lundi 27 Mai 2024 18:26:43 Objet: Re: [eqasim-org/ile-de-france] Statistical bug in primary locations attribution (Issue #237)
Hi Jacques, thanks for bringing this up. I just thought it through, and I think I agree. Though shuffling one is probably sufficient. I have the feeling that maybe the multinomial sampler was implemented differently before at some point and then this error got introduced.
Just out of interest, do you have any statistical analysis that shows the before and after?
— Reply to this email directly, [ https://github.com/eqasim-org/ile-de-france/issues/237#issuecomment-2133800761 | view it on GitHub ] , or [ https://github.com/notifications/unsubscribe-auth/AGNBKMA67ALEBJUJEQWX4JDZENNEHAVCNFSM6AAAAABILEPA62VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMZTHAYDANZWGE | unsubscribe ] . You are receiving this because you authored the thread. Message ID: @.***>
Ok, makes sense. Do you want to create a PR with the fix? Otherwise I can look into it end of the week.
Yes, I can do that. Should I do my PR directly on the develop branch?
Jacques.
-- Verimag - Bâtiment IMAG - Université Grenoble Alpes 150 place du torrent 38401 Saint Martin d’Hères @.*** www-verimag.imag.fr/~jcombaz phone: +33 (0)4 57 42 22 10 fax: +33 (0)4 57 42 22 22
De: "Sebastian Hörl" @.> À: "eqasim-org" @.> Cc: "Jacques Combaz" @.>, "Author" @.> Envoyé: Mardi 28 Mai 2024 16:14:56 Objet: Re: [eqasim-org/ile-de-france] Statistical bug in primary locations attribution (Issue #237)
Ok, makes sense. Do you want to create a PR with the fix? Otherwise I can look into it end of the week.
— Reply to this email directly, [ https://github.com/eqasim-org/ile-de-france/issues/237#issuecomment-2135326405 | view it on GitHub ] , or [ https://github.com/notifications/unsubscribe-auth/AGNBKMEDCHKP7QQWJOVZKHLZESGOBAVCNFSM6AAAAABILEPA62VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMZVGMZDMNBQGU | unsubscribe ] . You are receiving this because you authored the thread. Message ID: @.***>
Yes, thanks!
From: jcombaz @.> Sent: 28 May 2024 16:36 To: eqasim-org/ile-de-france @.> Cc: Sebastian HORL @.>; Comment @.> Subject: Re: [eqasim-org/ile-de-france] Statistical bug in primary locations attribution (Issue #237)
Yes, I can do that. Should I do my PR directly on the develop branch?
Jacques.
-- Verimag - Bâtiment IMAG - Université Grenoble Alpes 150 place du torrent 38401 Saint Martin d’Hères @.*** www-verimag.imag.fr/~jcombaz phone: +33 (0)4 57 42 22 10 fax: +33 (0)4 57 42 22 22
De: "Sebastian Hörl" @.> À: "eqasim-org" @.> Cc: "Jacques Combaz" @.>, "Author" @.> Envoyé: Mardi 28 Mai 2024 16:14:56 Objet: Re: [eqasim-org/ile-de-france] Statistical bug in primary locations attribution (Issue #237)
Ok, makes sense. Do you want to create a PR with the fix? Otherwise I can look into it end of the week.
— Reply to this email directly, [ https://cas5-0-urlprotect.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2fgithub.com%2feqasim%2dorg%2file%2dde%2dfrance%2fissues%2f237%23issuecomment%2d2135326405&umid=0f209477-7367-47e0-9804-7490e3a439b2&auth=b6005005a7b50bc3a68b2003f1e38d069f93f262-7ecdbc96cc0071acbaa3bec49c805354fc04615a | view it on GitHub ] , or [ https://cas5-0-urlprotect.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2fgithub.com%2fnotifications%2funsubscribe%2dauth%2fAGNBKMEDCHKP7QQWJOVZKHLZESGOBAVCNFSM6AAAAABILEPA62VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMZVGMZDMNBQGU&umid=0f209477-7367-47e0-9804-7490e3a439b2&auth=b6005005a7b50bc3a68b2003f1e38d069f93f262-8bf2fcdb6870404c38e083e9274e12861dbf8609 | unsubscribe ] . You are receiving this because you authored the thread. Message ID: @.***>
— Reply to this email directly, view it on GitHubhttps://cas5-0-urlprotect.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2fgithub.com%2feqasim%2dorg%2file%2dde%2dfrance%2fissues%2f237%23issuecomment%2d2135396002&umid=0f209477-7367-47e0-9804-7490e3a439b2&auth=b6005005a7b50bc3a68b2003f1e38d069f93f262-ee06a7da9fdb87251a482b2a59151f986101f9fb, or unsubscribehttps://cas5-0-urlprotect.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2fgithub.com%2fnotifications%2funsubscribe%2dauth%2fAAE6CTHTNNQSIMNRRYDPVGTZESJABAVCNFSM6AAAAABILEPA62VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMZVGM4TMMBQGI&umid=0f209477-7367-47e0-9804-7490e3a439b2&auth=b6005005a7b50bc3a68b2003f1e38d069f93f262-7dc3e4811ca49903bbe631281f7494076a66ce75. You are receiving this because you commented.Message ID: @.***>
In the file
synthesis/population/spatial/primary
, the following code introduces significant spatial correlations between home and primary locations, since duplicated values are consecutive in the data generated bynumpy.repeat
(which is used to sample locations among candidates). In particular, the probability of two commutes having the exact same origins and destinations is significantly larger than what it should be.To fix this the attribution of origins to destinations should be made independent of the order in the data sets, e.g.: