business-science / tidyquant

Bringing financial analysis to the tidyverse
https://business-science.github.io/tidyquant/
Other
852 stars 175 forks source link

`tq_mutate` creates misleading new variable names when original name contains a dot #95

Closed basteln3rk closed 6 years ago

basteln3rk commented 6 years ago

When the original variable name contains a dot, then after tq_mutate, the new variable name is broken! Moreover, the way the naming is broken is potentially quite confusing and may lead the analyst to make errors...

Minimum Working Example:

> test %>% tq_mutate(
+   select = colnames(test),
+   mutate_fun = lag.xts,
+   k = 1:2
+ )
# A tibble: 10 x 7
   date         var v.var var.1   v.1 var.2 v.var.1
   <date>     <int> <int> <int> <int> <int>   <int>
 1 2017-01-01     1     1    NA    NA    NA      NA
 2 2017-01-02     2     2     1     1    NA      NA
 3 2017-01-03     3     3     2     2     1       1
 4 2017-01-04     4     4     3     3     2       2
(...)

Session info:

> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Fedora 26 (Twenty Six)

Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] bindrcpp_0.2                  urca_1.3-0                    texreg_1.36.23                tikzDevice_0.10-1             reshape2_1.4.2                xtable_1.8-2                 
 [7] tidyquant_0.5.4               forcats_0.3.0                 stringr_1.2.0                 dplyr_0.7.4                   purrr_0.2.4                   readr_1.1.1                  
[13] tidyr_0.7.2                   tibble_1.4.2                  ggplot2_2.2.1                 tidyverse_1.2.1               quantmod_0.4-10               TTR_0.23-2                   
[19] PerformanceAnalytics_1.4.3541 xts_0.10-0                    zoo_1.8-0                     lubridate_1.7.2              

loaded via a namespace (and not attached):
 [1] haven_1.1.1         lattice_0.20-35     timetk_0.1.0        colorspace_1.3-2    utf8_1.1.3          rlang_0.2.0         pillar_1.2.1        foreign_0.8-69      glue_1.2.0         
[10] modelr_0.1.1        readxl_1.0.0        bindr_0.1           plyr_1.8.4          Quandl_2.8.0        munsell_0.4.3       gtable_0.2.0        cellranger_1.1.0    rvest_0.3.2        
[19] psych_1.7.8         knitr_1.16          curl_2.8.1          parallel_3.4.3      broom_0.4.3         Rcpp_0.12.15        scales_0.4.1        filehash_2.4-1      jsonlite_1.5       
[28] alphavantager_0.1.0 mnormt_1.5-5        hms_0.4.1           stringi_1.1.5       grid_3.4.3          cli_1.0.0           tools_3.4.3         magrittr_1.5        lazyeval_0.2.0     
[37] crayon_1.3.4        pkgconfig_2.0.1     xml2_1.2.0          assertthat_0.2.0    httr_1.3.1          rstudioapi_0.7      R6_2.2.2            nlme_3.1-131        compiler_3.4.3  
DavisVaughan commented 6 years ago

I've fixed this in the dev version. We now add ..1 rather than .1 to separate out duplicate names. Thanks!

library(tidyquant)

FANG %>% 
    group_by(symbol) %>%
    select(symbol, date, adjusted) %>%
    tq_mutate(adjusted, lag.xts, k = 1:2)
#> # A tibble: 4,032 x 5
#> # Groups:   symbol [4]
#>    symbol date       adjusted lag.xts lag.xts..1
#>    <chr>  <date>        <dbl>   <dbl>      <dbl>
#>  1 FB     2013-01-02     28.0    NA         NA  
#>  2 FB     2013-01-03     27.8    28.0       NA  
#>  3 FB     2013-01-04     28.8    27.8       28.0
#>  4 FB     2013-01-07     29.4    28.8       27.8
#>  5 FB     2013-01-08     29.1    29.4       28.8
#>  6 FB     2013-01-09     30.6    29.1       29.4
#>  7 FB     2013-01-10     31.3    30.6       29.1
#>  8 FB     2013-01-11     31.7    31.3       30.6
#>  9 FB     2013-01-14     31.0    31.7       31.3
#> 10 FB     2013-01-15     30.1    31.0       31.7
#> # ... with 4,022 more rows

Created on 2018-03-07 by the reprex package (v0.1.1.9000).