futureverse / future.apply

:rocket: R package: future.apply - Apply Function to Elements in Parallel using Futures
https://future.apply.futureverse.org
211 stars 16 forks source link

future_mapply throws "Error in seq.int(0, to0 - from, by) : 'to' must be a finite number" #17

Closed ThoDuyNguyen closed 6 years ago

ThoDuyNguyen commented 6 years ago

read_new and read_a1 is dummy function for testing purpose

library(lubridate)
library(data.table)
library(future.apply)

plan(multicore)

read_new <- function(processing_date){
  return(
    seq(from=1, to=100, by=1)
  )
}

read_a1 <- function(processing_date){
  return(
    seq(from=1, to=1000, by=1)
  )
}

calculate_retetention <-
  function(f_read_new,
           f_read_a1,
           processing_date,
           period,
           file_name_summary) {

    print(period)

    checking_cohort <-
      seq(from = ymd(processing_date),
          to = ymd(processing_date) - lubridate::days(period - 1),
          by = -1)

    v_a1 <- f_read_a1(ymd(processing_date))

    l_cohort <- lapply(checking_cohort, f_read_new)

    v_count_cohort <- unlist(lapply(l_cohort, length))

    l_remaining <- mapply(
      base::intersect,
      MoreArgs = list(x = v_a1),
      y = l_cohort,
      SIMPLIFY = FALSE
    )

    v_count_remaining <- unlist(lapply(l_remaining, length))

    v_index <- seq(from = 1, to = period, by = 1)

    fwrite(
      data.table(
        cohort_date = checking_cohort,
        day_index = v_index,
        cohort_size = v_count_cohort,
        remaining_cohort = v_count_remaining,
        retention_rate = round(100*v_count_remaining / v_count_cohort, 3)
      ),
      file_name_summary,
      col.names = FALSE,
      sep = "\t",
      append = TRUE
    )
  }

running_date <- seq(from = ymd(20180601),
                    to = ymd(20180630),
                    by = 1)

file_name_summary <- "test_retention.tsv"

running_period <- 7

future_mapply(
  calculate_retetention,
  MoreArgs = list(
    f_read_new = read_new,
    f_read_a1 = read_a1,
    period = running_period,
    file_name_summary = file_name_summary
  ),
  processing_date = running_date, 
  SIMPLIFY = FALSE
)

This code throws: Error in seq.int(0, to0 - from, by) : 'to' must be a finite number

But if i used mapply instead of future_mapply, the code is running well.

Would you please to let me know what the problem is and how to fix it?

Thank you in advance.

My session info

Session info ---------------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.5.0 (2018-04-23)
 system   x86_64, linux-gnu           
 ui       RStudio (1.1.453)           
 language (EN)                        
 collate  en_US.UTF-8                 
 tz       Asia/Ho_Chi_Minh            
 date     2018-07-17                  

Packages -------------------------------------------------------------------------------------------
 package      * version date       source        
 base         * 3.5.0   2018-05-14 local         
 codetools      0.2-15  2016-10-05 CRAN (R 3.5.0)
 compiler       3.5.0   2018-05-14 local         
 data.table   * 1.11.4  2018-05-27 CRAN (R 3.5.0)
 datasets     * 3.5.0   2018-05-14 local         
 devtools       1.13.5  2018-02-18 CRAN (R 3.5.0)
 digest         0.6.15  2018-01-28 CRAN (R 3.5.0)
 future       * 1.8.1   2018-05-03 CRAN (R 3.5.0)
 future.apply * 1.0.0   2018-06-20 CRAN (R 3.5.0)
 globals        0.12.0  2018-06-12 CRAN (R 3.5.0)
 graphics     * 3.5.0   2018-05-14 local         
 grDevices    * 3.5.0   2018-05-14 local         
 listenv        0.7.0   2018-01-21 CRAN (R 3.5.0)
 lubridate    * 1.7.4   2018-04-11 CRAN (R 3.5.0)
 magrittr       1.5     2014-11-22 CRAN (R 3.5.0)
 memoise        1.1.0   2017-04-21 CRAN (R 3.5.0)
 methods      * 3.5.0   2018-05-14 local         
 parallel       3.5.0   2018-05-14 local         
 Rcpp           0.12.17 2018-05-18 CRAN (R 3.5.0)
 stats        * 3.5.0   2018-05-14 local         
 stringi        1.2.3   2018-06-12 CRAN (R 3.5.0)
 stringr        1.3.1   2018-05-10 CRAN (R 3.5.0)
 tools          3.5.0   2018-05-14 local         
 utils        * 3.5.0   2018-05-14 local         
 withr          2.1.2   2018-03-15 CRAN (R 3.5.0)
 yaml           2.1.19  2018-05-01 CRAN (R 3.5.0)
ThoDuyNguyen commented 6 years ago

I changed the code a little bit and new version works fine

calculate_retetention <-
  function(f_read_new,
           f_read_a1,
           processing_date,
           index_day,
           checking_cohort,
           file_name_summary) {

    v_a1 <- f_read_a1(ymd(processing_date))

    l_cohort <- lapply(checking_cohort, f_read_new)

    v_count_cohort <- unlist(lapply(l_cohort, length))

    l_remaining <- mapply(
      base::intersect,
      MoreArgs = list(x = v_a1),
      y = l_cohort,
      SIMPLIFY = FALSE
    )

    v_count_remaining <- unlist(lapply(l_remaining, length))

    fwrite(
      data.table(
        cohort_date = checking_cohort,
        day_index = index_day,
        cohort_size = v_count_cohort,
        remaining_cohort = v_count_remaining,
        retention_rate = round(100*v_count_remaining / v_count_cohort, 3)
      ),
      file_name_summary,
      col.names = FALSE,
      sep = "\t",
      append = TRUE
    )
  }

running_date <- as.list(seq(from = ymd(20180501),
                    to = ymd(20180630),
                    by = 1))

file_name_summary <- "test_retention.tsv"

running_period <- 4

runninng_cohort_date <- mapply(get_cohort_date , 
       processing_date = running_date, 
       period = running_period, 
       SIMPLIFY = FALSE)

running_index_day <- seq(from = 1, to = running_period, by = 1)

future_mapply(
  calculate_retetention,
  MoreArgs = list(
    f_read_new = read_new,
    f_read_a1 = read_a1,
    file_name_summary = file_name_summary, 
    index_day = running_index_day
  ),
  processing_date = running_date,
  checking_cohort = runninng_cohort_date,
  SIMPLIFY = FALSE
)

Could you please to explain the reason caused first version of code not working.

Kind regards

HenrikBengtsson commented 6 years ago

Thanks for the report, this looks like a bug in future_mapply(). Here's a minimal example:

library(future.apply)
X <- as.Date("2018-06-01")
y0 <- mapply(FUN = identity, X, SIMPLIFY = FALSE)
str(y0)
# List of 1
#  $ : Date[1:1], format: "2018-06-01"

y1 <- future_mapply(FUN = identity, X, SIMPLIFY = FALSE)
str(y1)
# List of 1
#  $ : num 17683

This is because future_mapply() subsets X internally using:

x <- .subset(X, 1L)
str(x)
#  num 17683

which does not support Date object. It should use

x <- X[1L]
str(x)
# Date[1:1], format: "2018-06-01"

instead.

HenrikBengtsson commented 6 years ago

Fixed in the develop branch which contains the next release. To install develop already, see the README.

HenrikBengtsson commented 6 years ago

FYI, future.apply 1.0.1, where this is fixed, is now on CRAN. Thxs again for the report.