edwindj / ffbase

Basic (statistical) functionality for R package ff
github.com/edwindj/ffbase/wiki
35 stars 15 forks source link

error in ffdfrbind.fill() #56

Open JakubKomarek opened 5 years ago

JakubKomarek commented 5 years ago

I am trying to rbind two ffdfs objects and I follow the example from CRAN documentation. However, I always get this error:

Error in if (by < 1) stop("'by' must be > 0") : missing value where TRUE/FALSE needed In addition: Warning message: In chunk.default(from = 1L, to = 150L, by = c(logical = 46116860184273880), : NAs introduced by coercion to integer range I have also tried using ffbase2 and creating tbl.ffdf objects and then joining both dataframes by dplyr but the same error occurs.

Any advise will be appreciated.

x <- ffdfrbind.fill( as.ffdf(iris), as.ffdf(iris[, c("Sepal.Length", "Sepal.Width" , "Petal.Length")])

edwindj commented 5 years ago

Thanks for filing the issue:

x <- ffdfrbind.fill( as.ffdf(iris),
as.ffdf(iris[, c("Sepal.Length", "Sepal.Width"
, "Petal.Length")])

is working on the machines I tested upon (Linux and Windows).

What happens if you manually set the missing columns to NA and do an ffdfappend?

x1 <- as.ffdf(iris)
x2 <- as.ffdf(iris[, c("Sepal.Length", "Sepal.Width"
, "Petal.Length")])
x2$Petal.Width <- ff(NA, vmode = "logical", length = nrow(x2))
x2$Species <- ff(NA, vmode = "logical", length = nrow(x2))

x <- ffdfappend(x1, x2)

Still not working?

JakubKomarek commented 5 years ago

Hi,

Thank you for your swift answer! I am using Windows and still I got:

Error in if (by < 1) stop("'by' must be > 0") : missing value where TRUE/FALSE needed In addition: Warning message: In chunk.default(from = 1L, to = 150L, by = c(logical = 46116860184273880), : NAs introduced by coercion to integer range

Best wishes,

Jakub Komárek

On Thu, 22 Aug 2019 at 11:51, Edwin de Jonge notifications@github.com wrote:

Thanks for filing the issue:

x <- ffdfrbind.fill( as.ffdf(iris), as.ffdf(iris[, c("Sepal.Length", "Sepal.Width" , "Petal.Length")])

is working on the machines I tested upon (Linux and Windows).

What happens if you manually set the missing columns to NA and do an ffdfappend?

x1 <- as.ffdf(iris)x2 <- as.ffdf(iris[, c("Sepal.Length", "Sepal.Width" , "Petal.Length")])x2$Petal.Width <- ff(NA, vmode = "logical", length = nrow(x2))x2$Species <- ff(NA, vmode = "logical", length = nrow(x2)) x <- ffdfappend(x1, x2)

Still not working?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/edwindj/ffbase/issues/56?email_source=notifications&email_token=AKY5E53424JXD7V2NORREB3QFZOQXA5CNFSM4IODS222YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD44RG2I#issuecomment-523834217, or mute the thread https://github.com/notifications/unsubscribe-auth/AKY5E56WLV5MT3Z7YIDNHNTQFZOQXANCNFSM4IODS22Q .

JakubKomarek commented 5 years ago

Hi,

I tried the example in rstudio cloud and it worked. Do you have any idea why it does not work in my rstudio?

Thank you

Jakub

On Thu, 22 Aug 2019 at 19:13, Jakub Komárek komarekjakub42@gmail.com wrote:

Hi,

Thank you for your swift answer! I am using Windows and still I got:

Error in if (by < 1) stop("'by' must be > 0") : missing value where TRUE/FALSE needed In addition: Warning message: In chunk.default(from = 1L, to = 150L, by = c(logical = 46116860184273880), : NAs introduced by coercion to integer range

Best wishes,

Jakub Komárek

On Thu, 22 Aug 2019 at 11:51, Edwin de Jonge notifications@github.com wrote:

Thanks for filing the issue:

x <- ffdfrbind.fill( as.ffdf(iris), as.ffdf(iris[, c("Sepal.Length", "Sepal.Width" , "Petal.Length")])

is working on the machines I tested upon (Linux and Windows).

What happens if you manually set the missing columns to NA and do an ffdfappend?

x1 <- as.ffdf(iris)x2 <- as.ffdf(iris[, c("Sepal.Length", "Sepal.Width" , "Petal.Length")])x2$Petal.Width <- ff(NA, vmode = "logical", length = nrow(x2))x2$Species <- ff(NA, vmode = "logical", length = nrow(x2)) x <- ffdfappend(x1, x2)

Still not working?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/edwindj/ffbase/issues/56?email_source=notifications&email_token=AKY5E53424JXD7V2NORREB3QFZOQXA5CNFSM4IODS222YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD44RG2I#issuecomment-523834217, or mute the thread https://github.com/notifications/unsubscribe-auth/AKY5E56WLV5MT3Z7YIDNHNTQFZOQXANCNFSM4IODS22Q .

edwindj commented 5 years ago

Not at the moment: could you post the outcome of

sessionInfo()

?

Op ma 26 aug. 2019 om 09:22 schreef JakubKomarek notifications@github.com:

Hi,

I tried the example in rstudio cloud and it worked. Do you have any idea why it does not work in my rstudio?

Thank you

Jakub

On Thu, 22 Aug 2019 at 19:13, Jakub Komárek komarekjakub42@gmail.com wrote:

Hi,

Thank you for your swift answer! I am using Windows and still I got:

Error in if (by < 1) stop("'by' must be > 0") : missing value where TRUE/FALSE needed In addition: Warning message: In chunk.default(from = 1L, to = 150L, by = c(logical = 46116860184273880), : NAs introduced by coercion to integer range

Best wishes,

Jakub Komárek

On Thu, 22 Aug 2019 at 11:51, Edwin de Jonge notifications@github.com wrote:

Thanks for filing the issue:

x <- ffdfrbind.fill( as.ffdf(iris), as.ffdf(iris[, c("Sepal.Length", "Sepal.Width" , "Petal.Length")])

is working on the machines I tested upon (Linux and Windows).

What happens if you manually set the missing columns to NA and do an ffdfappend?

x1 <- as.ffdf(iris)x2 <- as.ffdf(iris[, c("Sepal.Length", "Sepal.Width" , "Petal.Length")])x2$Petal.Width <- ff(NA, vmode = "logical", length = nrow(x2))x2$Species <- ff(NA, vmode = "logical", length = nrow(x2)) x <- ffdfappend(x1, x2)

Still not working?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/edwindj/ffbase/issues/56?email_source=notifications&email_token=AKY5E53424JXD7V2NORREB3QFZOQXA5CNFSM4IODS222YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD44RG2I#issuecomment-523834217 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AKY5E56WLV5MT3Z7YIDNHNTQFZOQXANCNFSM4IODS22Q

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/edwindj/ffbase/issues/56?email_source=notifications&email_token=AAEEOHGL5XKWOASGZVL6OFLQGOACZA5CNFSM4IODS222YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5DQNYI#issuecomment-524748513, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEEOHFAXBMG3QWAL5UBHB3QGOACZANCNFSM4IODS22Q .

JakubKomarek commented 5 years ago

R version 3.6.1 (2019-07-05) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale: [1] LC_COLLATE=Czech_Czechia.1250 LC_CTYPE=Czech_Czechia.1250 LC_MONETARY=Czech_Czechia.1250 LC_NUMERIC=C LC_TIME=Czech_Czechia.1250

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] ffbase_0.12.7 ffbase2_0.2 dplyr_0.8.3 ff_2.2-14 bit_1.1-14

loaded via a namespace (and not attached): [1] Rcpp_1.0.2 rstudioapi_0.10 magrittr_1.5 usethis_1.5.1 devtools_2.1.0 tidyselect_0.2.5 pkgload_1.0.2 R6_2.4.0 [9] rlang_0.4.0 fastmatch_1.1-0 tools_3.6.1 pkgbuild_1.0.4 sessioninfo_1.1.1 cli_1.1.0 withr_2.1.2 remotes_2.1.0 [17] lazyeval_0.2.2 assertthat_0.2.1 digest_0.6.20 rprojroot_1.3-2 tibble_2.1.3 crayon_1.3.4 processx_3.4.1 purrr_0.3.2 [25] callr_3.3.1 fs_1.3.1 ps_1.3.0 testthat_2.2.1 memoise_1.1.0 glue_1.3.1 pillar_1.4.2 compiler_3.6.1 [33] desc_1.2.0 backports_1.1.4 prettyunits_1.0.2 pkgconfig_2.0.2

On Mon, 26 Aug 2019 at 09:46, Edwin de Jonge notifications@github.com wrote:

Not at the moment: could you post the outcome of

sessionInfo()

?

Op ma 26 aug. 2019 om 09:22 schreef JakubKomarek <notifications@github.com

:

Hi,

I tried the example in rstudio cloud and it worked. Do you have any idea why it does not work in my rstudio?

Thank you

Jakub

On Thu, 22 Aug 2019 at 19:13, Jakub Komárek komarekjakub42@gmail.com wrote:

Hi,

Thank you for your swift answer! I am using Windows and still I got:

Error in if (by < 1) stop("'by' must be > 0") : missing value where TRUE/FALSE needed In addition: Warning message: In chunk.default(from = 1L, to = 150L, by = c(logical = 46116860184273880), : NAs introduced by coercion to integer range

Best wishes,

Jakub Komárek

On Thu, 22 Aug 2019 at 11:51, Edwin de Jonge <notifications@github.com

wrote:

Thanks for filing the issue:

x <- ffdfrbind.fill( as.ffdf(iris), as.ffdf(iris[, c("Sepal.Length", "Sepal.Width" , "Petal.Length")])

is working on the machines I tested upon (Linux and Windows).

What happens if you manually set the missing columns to NA and do an ffdfappend?

x1 <- as.ffdf(iris)x2 <- as.ffdf(iris[, c("Sepal.Length", "Sepal.Width" , "Petal.Length")])x2$Petal.Width <- ff(NA, vmode = "logical", length

nrow(x2))x2$Species <- ff(NA, vmode = "logical", length = nrow(x2)) x <- ffdfappend(x1, x2)

Still not working?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <

https://github.com/edwindj/ffbase/issues/56?email_source=notifications&email_token=AKY5E53424JXD7V2NORREB3QFZOQXA5CNFSM4IODS222YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD44RG2I#issuecomment-523834217

,

or mute the thread <

https://github.com/notifications/unsubscribe-auth/AKY5E56WLV5MT3Z7YIDNHNTQFZOQXANCNFSM4IODS22Q

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/edwindj/ffbase/issues/56?email_source=notifications&email_token=AAEEOHGL5XKWOASGZVL6OFLQGOACZA5CNFSM4IODS222YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5DQNYI#issuecomment-524748513 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AAEEOHFAXBMG3QWAL5UBHB3QGOACZANCNFSM4IODS22Q

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/edwindj/ffbase/issues/56?email_source=notifications&email_token=AKY5E5ZRQTND54ULWKB4IQ3QGOC6LA5CNFSM4IODS222YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5DSGSY#issuecomment-524755787, or mute the thread https://github.com/notifications/unsubscribe-auth/AKY5E52PYRWCVQ5RWL5GF53QGOC6LANCNFSM4IODS22Q .

JakubKomarek commented 5 years ago

I just wanted to say that I am really grateful for your help!

Jakub

On Mon, 26 Aug 2019 at 09:48, Jakub Komárek komarekjakub42@gmail.com wrote:

R version 3.6.1 (2019-07-05) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale: [1] LC_COLLATE=Czech_Czechia.1250 LC_CTYPE=Czech_Czechia.1250 LC_MONETARY=Czech_Czechia.1250 LC_NUMERIC=C LC_TIME=Czech_Czechia.1250

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] ffbase_0.12.7 ffbase2_0.2 dplyr_0.8.3 ff_2.2-14 bit_1.1-14

loaded via a namespace (and not attached): [1] Rcpp_1.0.2 rstudioapi_0.10 magrittr_1.5 usethis_1.5.1 devtools_2.1.0 tidyselect_0.2.5 pkgload_1.0.2 R6_2.4.0 [9] rlang_0.4.0 fastmatch_1.1-0 tools_3.6.1 pkgbuild_1.0.4 sessioninfo_1.1.1 cli_1.1.0 withr_2.1.2 remotes_2.1.0 [17] lazyeval_0.2.2 assertthat_0.2.1 digest_0.6.20 rprojroot_1.3-2 tibble_2.1.3 crayon_1.3.4 processx_3.4.1 purrr_0.3.2 [25] callr_3.3.1 fs_1.3.1 ps_1.3.0 testthat_2.2.1 memoise_1.1.0 glue_1.3.1 pillar_1.4.2 compiler_3.6.1 [33] desc_1.2.0 backports_1.1.4 prettyunits_1.0.2 pkgconfig_2.0.2

On Mon, 26 Aug 2019 at 09:46, Edwin de Jonge notifications@github.com wrote:

Not at the moment: could you post the outcome of

sessionInfo()

?

Op ma 26 aug. 2019 om 09:22 schreef JakubKomarek < notifications@github.com>:

Hi,

I tried the example in rstudio cloud and it worked. Do you have any idea why it does not work in my rstudio?

Thank you

Jakub

On Thu, 22 Aug 2019 at 19:13, Jakub Komárek komarekjakub42@gmail.com wrote:

Hi,

Thank you for your swift answer! I am using Windows and still I got:

Error in if (by < 1) stop("'by' must be > 0") : missing value where TRUE/FALSE needed In addition: Warning message: In chunk.default(from = 1L, to = 150L, by = c(logical = 46116860184273880), : NAs introduced by coercion to integer range

Best wishes,

Jakub Komárek

On Thu, 22 Aug 2019 at 11:51, Edwin de Jonge < notifications@github.com> wrote:

Thanks for filing the issue:

x <- ffdfrbind.fill( as.ffdf(iris), as.ffdf(iris[, c("Sepal.Length", "Sepal.Width" , "Petal.Length")])

is working on the machines I tested upon (Linux and Windows).

What happens if you manually set the missing columns to NA and do an ffdfappend?

x1 <- as.ffdf(iris)x2 <- as.ffdf(iris[, c("Sepal.Length", "Sepal.Width" , "Petal.Length")])x2$Petal.Width <- ff(NA, vmode = "logical", length = nrow(x2))x2$Species <- ff(NA, vmode = "logical", length = nrow(x2)) x <- ffdfappend(x1, x2)

Still not working?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <

https://github.com/edwindj/ffbase/issues/56?email_source=notifications&email_token=AKY5E53424JXD7V2NORREB3QFZOQXA5CNFSM4IODS222YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD44RG2I#issuecomment-523834217

,

or mute the thread <

https://github.com/notifications/unsubscribe-auth/AKY5E56WLV5MT3Z7YIDNHNTQFZOQXANCNFSM4IODS22Q

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/edwindj/ffbase/issues/56?email_source=notifications&email_token=AAEEOHGL5XKWOASGZVL6OFLQGOACZA5CNFSM4IODS222YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5DQNYI#issuecomment-524748513 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AAEEOHFAXBMG3QWAL5UBHB3QGOACZANCNFSM4IODS22Q

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/edwindj/ffbase/issues/56?email_source=notifications&email_token=AKY5E5ZRQTND54ULWKB4IQ3QGOC6LA5CNFSM4IODS222YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5DSGSY#issuecomment-524755787, or mute the thread https://github.com/notifications/unsubscribe-auth/AKY5E52PYRWCVQ5RWL5GF53QGOC6LANCNFSM4IODS22Q .

edwindj commented 5 years ago

I cannot reproduce the bug on Rhub (which runs on Windows 2008 SP2), but don't despair...

Technically it is in realm of ff (and not ffbase), but I do have a hunch what the problem might be, using the error message and glaring the ff code (which is not mine).

ff uses chunking to process large vectors and data.frames. The size of a chunk is determined by the option "ffbatchbytes". It seems that on your Windows 10 machine(s) the value for the option isn't set correctly. May be because you are using 32bits R (so one option is to switch to 64bits).

ff sets this value automatically when library(ff) is called (see following code)

copied from ff:::.onLoad()

   if (is.null(getOption("ffmaxbytes"))) {
        if (.Platform$OS.type == "windows") {
            if (getRversion() >= "2.6.0") 
                options(ffmaxbytes = 0.5 * memory.limit() * (1024^2))
            else options(ffmaxbytes = 0.5 * memory.limit())
        }
        else {
            options(ffmaxbytes = 0.5 * 1024^3)
        }
    }

I suggest you set the options(ffmaxbytes) manually and try to run the examples again.

# e.g. 500MB
options(ffmaxbytes =  500 * (1024^2))