edwindj / ffbase

Basic (statistical) functionality for R package ff
github.com/edwindj/ffbase/wiki
35 stars 15 forks source link

ffdfply generates warnings "NAs produced by integer overflow" #32

Closed edwindj closed 10 years ago

edwindj commented 10 years ago

Reported by Don Boyd

I receive warnings, when running ffdfdply on a large ffdf, of the form: "In RECORDBYTES * length(fltr) : NAs produced by integer overflow" I receive one such warning per split. It does not appear to have led to any bad results. Here's an overview of what I'm doing: 1) Reading 436 million records, 3 columns, from an SQLite database and saving to ffdf [2 integer columns that are categorical, and 1 numeric floating point - nodeid_a, nodeid_b, and dist] 2) ordering the ffdf by the floating point column 3) using ffdfdply, split by one of the integer columns, to select and return a subset of the records in each group; when setting BATCHBYTES to 3 billion

jwijffels commented 10 years ago

Thanks for reporting. It's not really a big issue as the warning messages come merely from the printing out the messages which show the execution of the function when you set trace=TRUE (the default) like in '2014-01-23 14:13:26, working on split 2/6, extracting data in RAM of 7342 split elements, totalling, NA GB, while max specified data specified using BATCHBYTES is 2.79397 GB'. So this means your results are indeed correct.

In fact (RECORDBYTES * length(fltr) is multiplying 2 integers (the size in RAM of 1 row of the ffdf times the number of rows which are put into RAM) to print it out in the message. I'll update the function to convert these 2 integers to numerics so that the integer overflow does not occur any more.