Open Vayel opened 4 years ago
Sure that makes more sense! I remember using quantiles very early on in the project and it worked just as well.
good idea to me too
Le 2019-11-20 16:11, Max Halford a écrit :
Sure that makes more sense! I remember using quantiles very early on in the project and it worked just as well.
-- You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub [1], or unsubscribe [2]. [ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/XAI-ANITI/ethik/issues/116?email_source=notifications\u0026email_token=AELJGRPHTDZJS3MU6DXT4RLQUVHT7A5CNFSM4JPUAWC2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEESJP7Q#issuecomment-556046334", "url": "https://github.com/XAI-ANITI/ethik/issues/116?email_source=notifications\u0026email_token=AELJGRPHTDZJS3MU6DXT4RLQUVHT7A5CNFSM4JPUAWC2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEESJP7Q#issuecomment-556046334", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]
Links:
[1] https://github.com/XAI-ANITI/ethik/issues/116?email_source=notifications&email_token=AELJGRPHTDZJS3MU6DXT4RLQUVHT7A5CNFSM4JPUAWC2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEESJP7Q#issuecomment-556046334 [2] https://github.com/notifications/unsubscribe-auth/AELJGRJBNXPWI4QYY3R6JLDQUVHT7ANCNFSM4JPUAWCQ
Looks better:
But not always:
It's not that easy!
Actually, it doesn't really make sense to focus on quantiles only as we are talking about the mean. There's no reason why it should be equal to a quantile, especially for binary features (for which about 50% of the samples are equal to 0 and the rest is equal to 1).
Instead, I suggest to keep the current behaviour but to tell the user when a target mean is unrealistic (like 25 on the plot below).
Perhaps we could enable the user to define a threshold on a criterion (like the proportion of individuals who capture 50% of the weight) and use it to filter the target means?
Maybe a stupid question: shouldn't the confidence interval be large around unrealistic values?
We could also add the KDE of the values at the bottom of the plot, a bit like what is done here. This would give a visual cue of unreliable regions.
I'll check for the confidence interval.
I'd say the KDE is not sufficient. On the plot above, the original density (the black curve) has a similar value at age = 25
than at age = 38
(the original mean). Yet, shifting the mean to 25 gives a distribution that differs quite a lot from the original. The density contains all the information but is not easy enough to read I guess.
I totally agree with this idea!
regards, Laurent
Le 2019-11-21 13:46, Vincent Lefoulon a écrit :
Actually, it doesn't really make sense to focus on quantiles only as we are talking about the mean. There's no reason why it should be equal to a quantile, especially for binary features (for which about 50% of the samples are equal to 0 and the rest is equal to 1).
Instead, I suggest to keep the current behaviour but to tell the user when a target mean is unrealistic (like 25 on the plot below).
[1]
Perhaps we could enable the user to define a threshold on a criterion (like the proportion of individuals who capture 50% of the weight) and use it to filter the target means?
-- You are receiving this because you commented. Reply to this email directly, view it on GitHub [2], or unsubscribe [3]. [ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/XAI-ANITI/ethik/issues/116?email_source=notifications\u0026email_token=AELJGRN475YJBFMBDUUSLJTQUZ7IRA5CNFSM4JPUAWC2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEE2DGRI#issuecomment-557069125", "url": "https://github.com/XAI-ANITI/ethik/issues/116?email_source=notifications\u0026email_token=AELJGRN475YJBFMBDUUSLJTQUZ7IRA5CNFSM4JPUAWC2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEE2DGRI#issuecomment-557069125", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]
Links:
[1] https://user-images.githubusercontent.com/6124369/69339097-e9fa9d80-0c64-11ea-9a73-1d12d4354e3c.png [2] https://github.com/XAI-ANITI/ethik/issues/116?email_source=notifications&email_token=AELJGRN475YJBFMBDUUSLJTQUZ7IRA5CNFSM4JPUAWC2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEE2DGRI#issuecomment-557069125 [3] https://github.com/notifications/unsubscribe-auth/AELJGRN6HVX5A2MNXMOAHF3QUZ7IRANCNFSM4JPUAWCQ
... to me the confidence interval should indeed be high when a little amount of observations support most of the weights... using these intervals in standard graphs is a way to check how confident we are in the curves. Mentioning to the user that not enough observations have more than say 50% of the weights is also a good alternative.
Le 2019-11-21 13:54, Vincent Lefoulon a écrit :
I'll check for the confidence interval.
I'd say the KDE is not sufficient. On the plot above, the original density (the black curve) has a similar value at age = 25 than at age = 38 (the original mean). Yet, shifting the mean to 25 gives a distribution that differs quite a lot from the original. The density contains all the information but is not easy enough to read I guess.
-- You are receiving this because you commented. Reply to this email directly, view it on GitHub [1], or unsubscribe [2]. [ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/XAI-ANITI/ethik/issues/116?email_source=notifications\u0026email_token=AELJGRIJ33RCU77TMP644ITQU2AHXA5CNFSM4JPUAWC2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEE2D5II#issuecomment-557072033", "url": "https://github.com/XAI-ANITI/ethik/issues/116?email_source=notifications\u0026email_token=AELJGRIJ33RCU77TMP644ITQU2AHXA5CNFSM4JPUAWC2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEE2D5II#issuecomment-557072033", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]
Links:
[1] https://github.com/XAI-ANITI/ethik/issues/116?email_source=notifications&email_token=AELJGRIJ33RCU77TMP644ITQU2AHXA5CNFSM4JPUAWC2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEE2D5II#issuecomment-557072033 [2] https://github.com/notifications/unsubscribe-auth/AELJGRN5Y33TWYQNYKBEIXDQU2AHXANCNFSM4JPUAWCQ
Unfortunately, the confidence interval doesn't "work":
We should have a large interval for extreme ages. It's more or less the case for old people but not for young ones.
did the algorithm crashed or do you believe this is for another reason? We can talk about it...
Le 2019-11-21 14:29, Vincent Lefoulon a écrit :
Unfortunately, the confidence interval doesn't "work":
[1]
We should have a large interval for extreme ages.
-- You are receiving this because you commented. Reply to this email directly, view it on GitHub [2], or unsubscribe [3]. [ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/XAI-ANITI/ethik/issues/116?email_source=notifications\u0026email_token=AELJGRNPF7J2NYUUGJMYF33QU2EMFA5CNFSM4JPUAWC2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEE2HB7A#issuecomment-557084924", "url": "https://github.com/XAI-ANITI/ethik/issues/116?email_source=notifications\u0026email_token=AELJGRNPF7J2NYUUGJMYF33QU2EMFA5CNFSM4JPUAWC2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEE2HB7A#issuecomment-557084924", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]
Links:
[1] https://user-images.githubusercontent.com/6124369/69342142-33e68200-0c6b-11ea-9163-8d15239202bc.png [2] https://github.com/XAI-ANITI/ethik/issues/116?email_source=notifications&email_token=AELJGRNPF7J2NYUUGJMYF33QU2EMFA5CNFSM4JPUAWC2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEE2HB7A#issuecomment-557084924 [3] https://github.com/notifications/unsubscribe-auth/AELJGROQTWVEGH2QOB6UHFTQU2EMFANCNFSM4JPUAWCQ
No, it didn't crash. Are you here tomorrow?
Here is a plot we could do:
yes in the afternoon... let's then talk after lunch
Le 2019-11-21 16:35, Vincent Lefoulon a écrit :
No, it didn't crash. Are you here tomorrow?
Here is a plot we could do:
[1]
-- You are receiving this because you commented. Reply to this email directly, view it on GitHub [2], or unsubscribe [3]. [ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/XAI-ANITI/ethik/issues/116?email_source=notifications\u0026email_token=AELJGRJKEPSBVMXP6U3K3ADQU2TERA5CNFSM4JPUAWC2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEE2UHOI#issuecomment-557138873", "url": "https://github.com/XAI-ANITI/ethik/issues/116?email_source=notifications\u0026email_token=AELJGRJKEPSBVMXP6U3K3ADQU2TERA5CNFSM4JPUAWC2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEE2UHOI#issuecomment-557138873", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]
Links:
[1] https://user-images.githubusercontent.com/6124369/69352275-effc7880-0c7c-11ea-8874-4bf1e72a312f.png [2] https://github.com/XAI-ANITI/ethik/issues/116?email_source=notifications&email_token=AELJGRJKEPSBVMXP6U3K3ADQU2TERA5CNFSM4JPUAWC2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEE2UHOI#issuecomment-557138873 [3] https://github.com/notifications/unsubscribe-auth/AELJGRMM5S6OAD37BYC3PB3QU2TERANCNFSM4JPUAWCQ
Some plots about KDE. Chart above is the 2D explanation with ethik. Chart below is the dataset density with points sampled from it.
We can see that we can reach target means where there are no points, so the density doesn't seem to be a good criterion to find the valid target means.
Basically, we are doing this:
It gives us:
Top: density of data samples.
Bottom: influence on fake
y_pred
dataInstead, we should be doing this:
The problem is that we currently have the convention
tau == 0
being the mean. But the mean probably doesn't correspond to a quantile inq
.@MaxHalford I would suggest to get rid of taus and just talk about quantiles (with a special value to identify the original mean).