Closed jwijffels closed 10 years ago
Do you have a running example that currently fails? So that I can test it?
I have something running, but it happens after a long process and with confidential data. I'm trying to make a reproducible example of this.
I found the same problem, and fc <- fc | sapply(physical(x), is.factor.ff) should be replaced with fc <- fc | sapply(physical(x), is.factor.ff)[names(fc)]
because names(physical(x)) and names(x) may have different order of the names
a reproducible example is given below:
library(ffbase) aa <- as.ffdf(data.frame(a = letters[1:3], b = 1:3, c = 4:6)) bb <- data.frame(a = letters[3:4], b = 7:8, c = c(1.2, 3.6)) aa <- ffdfappend(aa, bb) aa <- ffdfappend(aa, bb)
names(aa) [1] "a" "b" "c" names(physical(aa)) [1] "b" "c" "a"
so you should make some checking in the ffdfappend function to make it robust.
Thanks for the reproducible code!
It is actually a ff
bug:
x <- as.ffdf(iris)
x[[2]] <- x[[2]] # this asignment messes up names and physical: second column is moved to last position
names(x)
names(physical(x))
In ffdfappend it is caused by adjustvmode=TRUE
, which does some column assignment (like above).
Will try to fix the ffdfappend
code, but not the ff
code...
Best,
Edwin
Think I fixed it: could you please check?
I don't think this has solved this issue. We should explicitely make sure that we are appending the right virtual column names. We should replace
for (i in which(fc)) { with for (i in names(which(fc))) {
as well as
fc <- fc | sapply(physical(x), is.factor.ff) should be replaced with fc <- fc | sapply(physical(x), is.factor.ff)[names(fc)]
adding x[names(x)] will not help. The error happens before
Jan
2013/12/13 Edwin de Jonge notifications@github.com
Think I fixed it: could you please check?
— Reply to this email directly or view it on GitHubhttps://github.com/edwindj/ffbase/issues/19#issuecomment-30500219 .
groeten/kind regards, Jan
Jan Wijffels Statistical Data Miner www.bnosac.be | +32 486 611708
Well, I did add some extra checks, didn't they help?
fc <- fc | sapply(physical(x)[names(x)], is.factor.ff)
Best
maybe it is a good thing to really be sure that the names are in the right order
fc <- fc | sapply(physical(x)[names(fc)], is.factor.ff)
and replacing
for (i in which(fc)) { with for (i in names(which(fc))) {
also does this 'making sure'
2013/12/13 Edwin de Jonge notifications@github.com
Well, I did add some extra checks, didn't they help?
fc <- fc | sapply(physical(x)[names(x)], is.factor.ff)
Best
— Reply to this email directly or view it on GitHubhttps://github.com/edwindj/ffbase/issues/19#issuecomment-30512589 .
groeten/kind regards, Jan
Jan Wijffels Statistical Data Miner www.bnosac.be | +32 486 611708
need to replace for (i in which(fc)) { with for (i in names(which(fc))) {
apparently physical does not necessarily return the same order of column names for each ffdf