像Python一样在R中增列

slsongge commented 4 years ago

简述问题

在R的for循环中，我想用paste0(i, "_score")作为数据框的新列名。不太好描述，你看代码应该更好理解。


library(tidyverse)
df <- data.frame(a = 1:5)

#### 需求如下
df %>%
  mutate(
    A_score = a,
    B_score = a
  )

#### 用循环如何实现-这个是错误的代码
for (i in LETTERS[1:2]) {

  df <- df %>% mutate(paste0(i, "_score") = a)

}

JiaxiangBU commented 4 years ago

@slsongge

假设矩阵

df <- matrix(1:100,nrow = 10)

df

##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
##  [1,]    1   11   21   31   41   51   61   71   81    91
##  [2,]    2   12   22   32   42   52   62   72   82    92
##  [3,]    3   13   23   33   43   53   63   73   83    93
##  [4,]    4   14   24   34   44   54   64   74   84    94
##  [5,]    5   15   25   35   45   55   65   75   85    95
##  [6,]    6   16   26   36   46   56   66   76   86    96
##  [7,]    7   17   27   37   47   57   67   77   87    97
##  [8,]    8   18   28   38   48   58   68   78   88    98
##  [9,]    9   19   29   39   49   59   69   79   89    99
## [10,]   10   20   30   40   50   60   70   80   90   100

批量设置10个列名字

library(tidyverse)

## -- Attaching packages -------------------------------------------------------------------- tidyverse 1.2.1 --

## √ ggplot2 3.2.1     √ purrr   0.3.3
## √ tibble  2.1.3     √ dplyr   0.8.3
## √ tidyr   1.0.2     √ stringr 1.4.0
## √ readr   1.3.1     √ forcats 0.4.0

## Warning: package 'ggplot2' was built under R version 3.6.2

## Warning: package 'tidyr' was built under R version 3.6.2

## Warning: package 'purrr' was built under R version 3.6.1

## Warning: package 'dplyr' was built under R version 3.6.1

## -- Conflicts ----------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

df %>% 
    `names<-`(paste0(1:10,"_score")) %>% 
    as.data.frame()

##    V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
## 1   1 11 21 31 41 51 61 71 81  91
## 2   2 12 22 32 42 52 62 72 82  92
## 3   3 13 23 33 43 53 63 73 83  93
## 4   4 14 24 34 44 54 64 74 84  94
## 5   5 15 25 35 45 55 65 75 85  95
## 6   6 16 26 36 46 56 66 76 86  96
## 7   7 17 27 37 47 57 67 77 87  97
## 8   8 18 28 38 48 58 68 78 88  98
## 9   9 19 29 39 49 59 69 79 89  99
## 10 10 20 30 40 50 60 70 80 90 100

slsongge commented 4 years ago

我的想法应该和你的逻辑不一样哈：我这边有一个数据框df，假设只有一列（其实几列无所谓）。我想写一个循环，每次循环的时候向df中增加一列，列名是for中的值+一个固定的字符。

df <- data.frame(A = 1:5)

for (i in c('a','b','d')) {

  paste0(i, "_score") <- df$A * 10

}

slsongge commented 4 years ago

这是我解决这个问题的间接方法

library(tidyverse)
df <- data.frame(A = 1:5)
df_tmp <- matrix(NA, ncol = 3, nrow = 5) %>%
  as.data.frame() %>%
  `names<-`(paste0(c('a','b','d'), "_score"))

df_bind <- bind_cols(df, df_tmp)

for (i in c('a','b','d')) {

  df_bind[, paste0(c('a','b','d'), "_score")] <- df$A

}

df_bind

JiaxiangBU commented 4 years ago

我这边有一个数据框df，假设只有一列（其实几列无所谓）。我想写一个循环，每次循环的时候向df中增加一列，列名是for中的值+一个固定的字符。

那我在想，你每次都在一个同一个df上增加一列，必然是行数相等，为什么不考虑直接先构建一个临时的 df_tmp，包含了你所有的想要新增的列，然后 names <- 可以批量命名，这样可以避免 for 循环，使用向量化操作。 @slsongge

slsongge commented 4 years ago

那这样的话，实现起来就是我上条消息回复的代码吗？

JiaxiangBU commented 4 years ago

@slsongge

library(tidyverse)
df <- data.frame(A = 1:5)
df_tmp <- matrix(NA, ncol = 3, nrow = 5) %>%
    as.data.frame() %>%
    `names<-`(paste0(c('a', 'b', 'd'), "_score"))

df_bind <- bind_cols(df, df_tmp)

for (i in c('a', 'b', 'd')) {
    df_bind[, paste0(c('a', 'b', 'd'), "_score")] <- df$A

}

df_bind

##   A a_score b_score d_score
## 1 1       1       1       1
## 2 2       2       2       2
## 3 3       3       3       3
## 4 4       4       4       4
## 5 5       5       5       5

这是我解决这个问题的间接方法

如果最终目的是生成这个数据集df_bind，每列都一样，我觉得可以这么搞。

比如你想要新增三列。

matrix(rep(df$A, 3 + 1),
       nrow = nrow(df))    %>%
    `colnames<-`(c(df %>% names(), paste0(letters[0:3], "_score"))) %>%
    as.data.frame()

##   A a_score b_score c_score
## 1 1       1       1       1
## 2 2       2       2       2
## 3 3       3       3       3
## 4 4       4       4       4
## 5 5       5       5       5

# 变量
# df
# 3 也就是新增多少行

JiaxiangBU commented 4 years ago

@slsongge

library(tidyverse)
df <- data.frame(A = 1:5)
df_tmp <- matrix(NA, ncol = 3, nrow = 5) %>%
    as.data.frame() %>%
    `names<-`(paste0(c('a', 'b', 'd'), "_score"))

df_bind <- bind_cols(df, df_tmp)

for (i in c('a', 'b', 'd')) {
    df_bind[, paste0(c('a', 'b', 'd'), "_score")] <- df$A

}

df_bind

##   A a_score b_score d_score
## 1 1       1       1       1
## 2 2       2       2       2
## 3 3       3       3       3
## 4 4       4       4       4
## 5 5       5       5       5

其实我更想知道的是，能不能再循环中增加列

df # 还是这个数据，假设每次我都随机增加一列

##   A
## 1 1
## 2 2
## 3 3
## 4 4
## 5 5

set.seed(123)
for (i in 1:3) {
    feature_name <- paste0(letters[i],'_score')
    df[[feature_name]] <- runif(nrow(df),0,1)
}
df

##   A   a_score   b_score   c_score
## 1 1 0.2875775 0.0455565 0.9568333
## 2 2 0.7883051 0.5281055 0.4533342
## 3 3 0.4089769 0.8924190 0.6775706
## 4 4 0.8830174 0.5514350 0.5726334
## 5 5 0.9404673 0.4566147 0.1029247

slsongge commented 4 years ago

上面的问题是接下我我提问的问题的一部分，想看看你如何实现这个需求。这个需求是这样的：当前有一个列表（list_01），需要用列表中数据构建回归模型，然后将这个模型应用到数据框（df_raw）中拟合出y。相信你看代码更好理解，代码如下：

library(tidyverse)

df_raw <- data.frame(a = 1:5, b = 1:5)
list_01 <- list(
  a = c(1:15), 
  b = c(1:15),
  d = c(1:15)
)

df_raw_new <- df_raw

for (i in c("a","b","d")) {

  #### 以列表的 d 为y，其他两个分别作为x去做回归，然后记录回归系数
  assign(str_c("reg_", i), lm(list_01$d ~ eval(parse(text = str_c("list_01$", i)))))
  assign("coef1", get(str_c("reg_", i))$coefficients[1])
  assign("coef2", get(str_c("reg_", i))$coefficients[2])

  #### 将这个模型作用到数据框df_raw的列，去预测y
  iscore <- round(get("coef1") + get("coef2") * eval(parse(text = str_c("df_raw$", i))), 2)
  df_raw_new[[str_c(i, "score")]] <- iscore
}

df_raw_new

返回的结果如下图：

JiaxiangBU commented 4 years ago

上面的问题是接下我我提问的问题的一部分，想看看你如何实现这个需求。这个需求是这样的：当前有一个列表（list_01），需要用列表中数据构建回归模型，然后将这个模型应用到数据框（df_raw）中拟合出y。相信你看代码更好理解

嗯，我看到代码是有一些复杂，就是你偏好在 for 循环里面实现，而非向量化操作。另外就是其实就是循环做单变量回归？然后把a和b对应的模型去预测结果？ @slsongge

slsongge commented 4 years ago

另外就是其实就是循环做单变量回归？然后把a和b对应的模型去预测结果？ @slsongge

对的，就是这个样子。

JiaxiangBU commented 4 years ago

@slsongge

library(tidyverse)

test_df <- data.frame(a = 1:5, b = 1:5)
x <- data.frame(a = c(1:15),
          b = c(1:15))
y <- c(1:15)

for (i in 1:length(x)) {
    x_input <- x[[i]]
    y_input <- y
    fit <- lm(y_input ~ x_input)
    x_input <- data.frame(x_input = test_df[[i]])
    new_name <- test_df %>% names %>% .[[i]] %>% paste0("_score")
    test_df[new_name] <- predict.lm(fit, newdata = x_input)
}

test_df

##   a b a_score b_score
## 1 1 1       1       1
## 2 2 2       2       2
## 3 3 3       3       3
## 4 4 4       4       4
## 5 5 5       5       5

JiaxiangBU commented 4 years ago

@slsongge 好的，我把 issue 加到目录中。

[6103971] 2020-03-18: update, use push in Makefile

JiaxiangBU / tutoring2

像Python一样在R中增列 #45

简述问题