edwindj / ffbase

Basic (statistical) functionality for R package ff
github.com/edwindj/ffbase/wiki
35 stars 15 forks source link

bug in ffdfappend #19

Closed jwijffels closed 10 years ago

jwijffels commented 10 years ago

need to replace for (i in which(fc)) { with for (i in names(which(fc))) {

apparently physical does not necessarily return the same order of column names for each ffdf

edwindj commented 10 years ago

Do you have a running example that currently fails? So that I can test it?

jwijffels commented 10 years ago

I have something running, but it happens after a long process and with confidential data. I'm trying to make a reproducible example of this.

gaizoule commented 10 years ago

I found the same problem, and fc <- fc | sapply(physical(x), is.factor.ff) should be replaced with fc <- fc | sapply(physical(x), is.factor.ff)[names(fc)]

because names(physical(x)) and names(x) may have different order of the names

gaizoule commented 10 years ago

a reproducible example is given below:

library(ffbase) aa <- as.ffdf(data.frame(a = letters[1:3], b = 1:3, c = 4:6)) bb <- data.frame(a = letters[3:4], b = 7:8, c = c(1.2, 3.6)) aa <- ffdfappend(aa, bb) aa <- ffdfappend(aa, bb)

then we have

names(aa) [1] "a" "b" "c" names(physical(aa)) [1] "b" "c" "a"

so you should make some checking in the ffdfappend function to make it robust.

edwindj commented 10 years ago

Thanks for the reproducible code!

It is actually a ff bug:

x <- as.ffdf(iris)
x[[2]] <- x[[2]] # this asignment messes up names and physical: second column is moved to last position
names(x)
names(physical(x))

In ffdfappend it is caused by adjustvmode=TRUE, which does some column assignment (like above). Will try to fix the ffdfappend code, but not the ff code...

Best,

Edwin

edwindj commented 10 years ago

Think I fixed it: could you please check?

jwijffels commented 10 years ago

I don't think this has solved this issue. We should explicitely make sure that we are appending the right virtual column names. We should replace

for (i in which(fc)) { with for (i in names(which(fc))) {

as well as

fc <- fc | sapply(physical(x), is.factor.ff) should be replaced with fc <- fc | sapply(physical(x), is.factor.ff)[names(fc)]

adding x[names(x)] will not help. The error happens before

Jan

2013/12/13 Edwin de Jonge notifications@github.com

Think I fixed it: could you please check?

— Reply to this email directly or view it on GitHubhttps://github.com/edwindj/ffbase/issues/19#issuecomment-30500219 .

groeten/kind regards, Jan

Jan Wijffels Statistical Data Miner www.bnosac.be | +32 486 611708

edwindj commented 10 years ago

Well, I did add some extra checks, didn't they help?

fc <- fc | sapply(physical(x)[names(x)], is.factor.ff)

Best

jwijffels commented 10 years ago

maybe it is a good thing to really be sure that the names are in the right order

fc <- fc | sapply(physical(x)[names(fc)], is.factor.ff)

and replacing

for (i in which(fc)) { with for (i in names(which(fc))) {

also does this 'making sure'

2013/12/13 Edwin de Jonge notifications@github.com

Well, I did add some extra checks, didn't they help?

fc <- fc | sapply(physical(x)[names(x)], is.factor.ff)

Best

— Reply to this email directly or view it on GitHubhttps://github.com/edwindj/ffbase/issues/19#issuecomment-30512589 .

groeten/kind regards, Jan

Jan Wijffels Statistical Data Miner www.bnosac.be | +32 486 611708