JiaxiangBU / tutoring2

The collection of Python and R code scripts to tutor others.
https://jiaxiangbu.github.io/tutoring2/
Other
8 stars 7 forks source link

spread 函数 key 有重复值 #32

Closed JiaxiangBU closed 4 years ago

JiaxiangBU commented 4 years ago

================ Jiaxiang Li 2020-02-20

library(tidyverse)
## -- Attaching packages ---------------------------------------------------------------- tidyverse 1.2.1 --

## √ ggplot2 3.2.1     √ purrr   0.3.3
## √ tibble  2.1.3     √ dplyr   0.8.3
## √ tidyr   0.8.3     √ stringr 1.4.0
## √ readr   1.3.1     √ forcats 0.4.0

## Warning: package 'ggplot2' was built under R version 3.6.2

## Warning: package 'purrr' was built under R version 3.6.1

## Warning: package 'dplyr' was built under R version 3.6.1

## -- Conflicts ------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(reshape2)
## 
## Attaching package: 'reshape2'

## The following object is masked from 'package:tidyr':
## 
##     smiths
attach(ToothGrowth)
mydata <- dcast(ToothGrowth, len ~ supp)
## Using dose as value column: use value.var to override.

## Aggregation function missing: defaulting to length
data <- cbind(ToothGrowth[1:30, ], ToothGrowth[31:60, ])
data <- data.frame(dose = data[, 3], VC = data[, 1], OJ = data[, 4])
ToothGrowth %>% 
    group_by(dose, supp) %>% 
    count()
## # A tibble: 6 x 3
## # Groups:   dose, supp [6]
##    dose supp      n
##   <dbl> <fct> <int>
## 1   0.5 OJ       10
## 2   0.5 VC       10
## 3   1   OJ       10
## 4   1   VC       10
## 5   2   OJ       10
## 6   2   VC       10

这里有重复数据,但是都是10行,对称的,这里加一列index即可,使得 key(s) 唯一不重复。

df <- 
ToothGrowth %>% 
    group_by(dose, supp) %>% 
    mutate(index = row_number()) %>% 
    ungroup()
df %>% 
    spread(supp, len)
## # A tibble: 30 x 4
##     dose index    OJ    VC
##    <dbl> <int> <dbl> <dbl>
##  1   0.5     1  15.2   4.2
##  2   0.5     2  21.5  11.5
##  3   0.5     3  17.6   7.3
##  4   0.5     4   9.7   5.8
##  5   0.5     5  14.5   6.4
##  6   0.5     6  10    10  
##  7   0.5     7   8.2  11.2
##  8   0.5     8   9.4  11.2
##  9   0.5     9  16.5   5.2
## 10   0.5    10   9.7   7  
## # ... with 20 more rows
df %>% 
    group_by(dose, index, supp) %>% 
    count(1) %>% 
    ungroup() %>% 
    summarise(mean(n>1))
## # A tibble: 1 x 1
##   `mean(n > 1)`
##           <dbl>
## 1             0

因此 keys 唯一了。