Open geotheory opened 2 years ago
Hey, thanks for the report. That is indeed concerning. I will try to take a look later today and see what is going wrong there.
Do you have appetite for adding a mode option to switch between exclusive and non-exclusive aggregation? I love Upset plots but my main criticism is they can be extremely misleading. If the aim is to visualise the size of an intersect between two sets (i.e. "better Venn") they mislead when the true intersect is split across multiple bars, some of which may be pushed off-screen by a cap on their number. I think making this expicit in the function's options raises awareness of this problem as well as offering the solution.
Is this fixed by now? Never noticed anything wrong but from now on I will be much more careful. Would be good to know if one can simpy rely on the calculations made by the package.
@z3tt - No it's still as was. Even if/when this is resolved I feel this viz method is pretty problematic to use without clearly caveatting the exclusionary nature of its summarisations (ie. its XY figure excludes where XYZ). If your interest is in the true XY figure (including observations of XYZ) then you need an alternative workflow maybe like this (but note it only summarises genre intersections ie. it omits single-genre movies):
expand_genres = function(x){
if(length(x) == 1) return(tibble(x = character(0), y = character(0)))
expand_grid(x = x, y = x) |> filter(y > x)
}
purrr::map_df(d$Genres, expand_genres) |> count(x, y, sort = TRUE)
# A tibble: 20 × 3
x y n
<chr> <chr> <int>
1 comedy short 303
2 comedy drama 265
3 drama romance 243
4 comedy romance 206
5 animation comedy 167
6 animation short 159
7 action drama 148
8 drama short 75
9 action comedy 59
10 action romance 24
11 comedy documentary 13
12 documentary drama 10
13 romance short 10
14 action short 6
15 animation romance 5
16 documentary short 3
17 animation documentary 2
18 animation drama 2
19 action animation 1
20 documentary romance 1
The data doesn't add up as far as I can see:
The graphic shows drama + comedy as 195, whereas the actual intersect is 180. It seems you are lumping in the other categories not manually selected for the plot with the
sets
argument. But if you do this then the app is being inconsistent, because when omitting thesets
argument the categories are fully exclusive. In fact the real drama + comedy intersect is 265.