Error in query interval

danielsjf commented 9 years ago

Hi,

I tried version 0.13 and everything looks great. I really love the new functions. However, I just ran the command:

QueryProd <- query_interval(db,
                            "Generator",
                            "Generation",
                            phase=Phase_id,
                            c("category","region","sample"))

and got the error

Error in vecseq(f__, len__, if (allow.cartesian || notjoin) NULL else as.integer(max(nrow(x),  : 
  Join results in 8185288 rows; more than 4497408 = max(nrow(x),nrow(i)). Check for duplicate key
values in i, each of which join to the same group in x over and over again. If that's ok, try including
`j` and dropping `by` (by-without-by) so that j runs for each group to avoid the large allocation. If 
you are sure you wish to proceed, rerun with allow.cartesian=TRUE. Otherwise, please search for
this error message in the FAQ, Wiki, Stack Overflow and datatable-help for advice.

This is what the traceback looks like:

20 vecseq(f__, len__, if (allow.cartesian || notjoin) NULL else as.integer(max(nrow(x), 
    nrow(i)))) 
19 `[.data.table`(out2, cj2, roll = TRUE) 
18 out2[cj2, roll = TRUE] 
17 query_master_each(., time, col, prop, columns, time.range, filter, 
    phase) 
16 eval(expr, envir, enclos) 
15 eval(args[[j]]$expr, envir = env) 
14 do_.grouped_df(.data, .dots = lazyeval::lazy_dots(...)) 
13 do_(.data, .dots = lazyeval::lazy_dots(...)) 
12 do(., query_master_each(., time, col, prop, columns, time.range, 
    filter, phase)) 
11 function_list[[k]](value) 
10 withVisible(function_list[[k]](value)) 
9 freduce(value, `_function_list`) 
8 `_fseq`(`_lhs`) 
7 eval(expr, envir, enclos) 
6 eval(quote(`_fseq`(`_lhs`)), env, env) 
5 withVisible(eval(quote(`_fseq`(`_lhs`)), env, env)) 
4 db2 %>% do(query_master_each(., time, col, prop, columns, time.range, 
    filter, phase)) 
3 query_master(db, "interval", ...) 
2 query_interval(db, "Generator", "Generation", phase = Phase_id, 
    c("category", "region", "sample")) at Create.R#198
1 CreateCSVs(folder)

There is no join command in my code, so I assume that it happens somewhere in the function. Any idea how I can fix this?

danielsjf commented 9 years ago

Update: I get the same error now also for other Plexos solutions which previously worked. I think this could have been introduced in going from 0.12 to 0.13.

eibanez commented 9 years ago

I think I know where this is happening. I will have to fix and resubmit. In the mean time, you might need to downgrade to 0.12.

danielsjf commented 9 years ago

Thanks for the fast reply. I got the old version from a colleague, but do you know why the v0.12 is not anymore on CRAN? Only the non-build version 0.12 is there.

And also, is it possible to have a note in the NEWS when you change something in the DB or use a different numbering for the DB version? Now it is difficult to guess if I should really update the solution DBs every time I use a new version of Rplexos, while sometimes it is just the functions that change.

eibanez commented 9 years ago

I will try to keep a new entry in the NEWS file as I continue developing the package.

I have not been able to reproduce this error with my test data. It might have to do with the size of the query results. Is it possible for you to send me a copy of the database so I can diagnose the problem? You can post a link here or send to eduardo [dot] ibanez [at] nrel [dot] gov

danielsjf commented 9 years ago

Ok, thanks!

Sending the model will be difficult. It is not my model and there is quite some information in it. I asked, but they said that I was not allowed to send it. Are there maybe tests that I can do to find out where the problem lies?

eibanez commented 9 years ago

Can you run the code here and let me know what gets printed on screen?

https://gist.github.com/eibanez/794021626bdbb7061ae3

danielsjf commented 9 years ago

I get an error saying:

Error in UseMethod("filter_") : 
  no applicable method for 'filter_' applied to an object of class "NULL"

Traceback:

11 filter_(.data, .dots = lazyeval::lazy_dots(...)) 
10 filter(., property == prop) 
9 function_list[[k]](value) 
8 withVisible(function_list[[k]](value)) 
7 freduce(value, `_function_list`) 
6 `_fseq`(`_lhs`) 
5 eval(expr, envir, enclos) 
4 eval(quote(`_fseq`(`_lhs`)), env, env) 
3 withVisible(eval(quote(`_fseq`(`_lhs`)), env, env)) 
2 db$properties[[1]] %>% filter(property == prop) 
1 query_test(db, "interval", "Generator", "Generation")

danielsjf commented 9 years ago

Sorry, I was too fast and ran the code on the wrong folder. When running it in the right folder I get the old error:

Error in vecseq(f__, len__, if (allow.cartesian || notjoin) NULL else as.integer(max(nrow(x),  : 
  Join results in 8185288 rows; more than 4497408 = max(nrow(x),nrow(i)). Check for duplicate key
values in i, each of which join to the same group in x over and over again. If that's ok, try including `j`
and dropping `by` (by-without-by) so that j runs for each group to avoid the large allocation. If you are 
sure you wish to proceed, rerun with allow.cartesian=TRUE. Otherwise, please search for this error 
message in the FAQ, Wiki, Stack Overflow and datatable-help for advice.

Traceback:

4: vecseq(f__, len__, if (allow.cartesian || notjoin) NULL else as.integer(max(nrow(x), 
       nrow(i))))
3: `[.data.table`(out2, cj2, roll = TRUE) at #88
2: out2[cj2, roll = TRUE] at #88
1: query_test(db, "interval", "Generator", "Generation")

eibanez commented 9 years ago

Try the updated test function. What is the output? https://gist.github.com/eibanez/794021626bdbb7061ae3

danielsjf commented 9 years ago

Hi,

No more error. Instead I get the following:

key:    512
time:   8784
time2:  8784
out2:   4148865
cj2:    4497408
out3:   8185288

eibanez commented 9 years ago

The problem is that cj2 and out3 should have the same length (and they don't). Even though there is no error, you might be getting incorrect results from your query.

I updated the gist above (revision 4). Can you rerun, show me what is printed on screen and send me the 4 CSV files that get created. I added a couple instruction that mask the generator names and the values (see lines 71-72). This will give me enough information to debug but hides the proprietary data.

danielsjf commented 9 years ago

I sent you the files via wetransfer. However, I replaced the write.csv commands with save() to save them as RData files which are much more compact and the writing goes much faster (I stopped when out.csv was more than 200MB). The output was (in case this is still relevant):

> test <- query_test(db, "interval", "Generator", "Generation")
|==================================================================|100% ~0 s remaining     
min.time.data: 2025-01-01 00:00:00
max.time.data: 2026-01-01 23:00:00
|==================================================================|100% ~0 s remaining     
key:    512
time:   8784
time2:  8784
out2:   4148865
cj2:    4497408
out3:   8185288

eibanez commented 9 years ago

I got the files, but it's getting a little hard to debug. From earlier comments, it looks like you are querying multiple samples. Can you rerun the test query asking for more columns, e.g.:

test <- query_test(db, "interval", "Generator", "Generation", c("name","category","region","sample"))

Feel free to modify query_test if you want to obscure any of the additional columns.

danielsjf commented 9 years ago

Hi, I just sent you the new files. The output hasn't changed.

BenjaminWatts commented 9 years ago

Hey - does anyone else have a copy of v 0.12 they can share, my email is benjaminwwatts at gmail dot com!

eibanez commented 9 years ago

I will try to push a fix to CRAN this week. I'm trying to fix and a couple other outstanding issues.

@takje Do you have the older binary file?

BenjaminWatts commented 9 years ago

Great -- many thanks!

eibanez commented 9 years ago

@takje: I still have no clue what is happening here. Can you send me the results of the following queries? This will pull the results for a single generator, so I hope it's not an issue. You can mask the generator name if you want.

query_sql(db, "SELECT * FROM key WHERE key == 82747")
query_sql(db, "SELECT * FROM data_interval_Generator_Generation WHERE key == 82747")
query_sql(db, "SELECT * FROM Generator_Generation WHERE key == 82747")

danielsjf commented 9 years ago

@eibanez: This is the code I ran:

anonymize <- function(data){
  if('name' %in% names(data)){
    data <- data %>%
      mutate(name = name %>% factor %>% as.numeric)
  }
  if('region' %in% names(data)){
    data <- data %>%
      mutate(region = region %>% factor %>% as.numeric)
  }
  data
}
a <- query_sql(db, "SELECT * FROM key WHERE key == 82747") %>% anonymize()
b <- query_sql(db, "SELECT * FROM data_interval_Generator_Generation WHERE key == 82747") %>% anonymize()
c <- query_sql(db, "SELECT * FROM Generator_Generation WHERE key == 82747") %>% anonymize()
save(a,b,c,file='QueryData.RData')

I sent you the files again via wetransfer.

BenjaminWatts commented 9 years ago

Hey - thanks for sharing the bin - unfortunately I didn't manage to get it working, however I will look forward to hearing from you in the near future

if you'd like a sample of the code - I'd be very happy to help,, just let me know what you need , apologies I am a little new to R so some relatively good instructions might help!

On 22 April 2015 at 08:39, Jef D notifications@github.com wrote:

@eibanez https://github.com/eibanez: This is the code I ran:

anonymize <- function(data){ if('name' %in% names(data)){ data <- data %>% mutate(name = name %>% factor %>% as.numeric) } if('region' %in% names(data)){ data <- data %>% mutate(region = region %>% factor %>% as.numeric) } data } a <- query_sql(db, "SELECT * FROM key WHERE key == 82747") %>% anonymize() b <- query_sql(db, "SELECT * FROM data_interval_Generator_Generation WHERE key == 82747") %>% anonymize() c <- query_sql(db, "SELECT * FROM Generator_Generation WHERE key == 82747") %>% anonymize() save(a,b,c,file='QueryData.RData')

I sent you the files again via wetransfer.

— Reply to this email directly or view it on GitHub https://github.com/NREL/rplexos/issues/29#issuecomment-95062526.

Kind Regards,

Ben Watts

Tel: +44 7730 611821

eibanez commented 9 years ago

@BenjaminWatts: v0.13 has a known error and an upcoming release will fix it

danielsjf commented 9 years ago

@BenjaminWatts: I copied the rplexos folder straight from the library. Therefore, check where you R library folder is on your computer. For me it is C:\Users[username]\Documents\R\R-3.1.2\library, but it can be another folder on your computer, depending on where you installed it. In the library, replace the existing Rplexos folder with the one in the zip file. Updating via CRAN will overwrite the files in this folder in case a new version arrives.

eibanez commented 9 years ago

@takje I think I fixed the error. Will you let me know if it persists on the next version? I'm planning on submitting to CRAN today.

danielsjf commented 9 years ago

@eibanez: I can confirm that the results are now as expected with no errors!

Small comment though on the latest submission. When I try to install the package via install.packages, I receive an error that the source is v1.0, but the binary is 0.13.

install.packages('rplexos')

  There is a binary version available (and will be installed) but the source version is
  later:
        binary source
rplexos   0.13    1.0

trying URL 'http://cran.rstudio.com/bin/windows/contrib/3.1/rplexos_0.13.zip'
Content type 'application/zip' length 567128 bytes (553 Kb)
opened URL
downloaded 553 Kb

package ‘rplexos’ successfully unpacked and MD5 sums checked

However, when I manually download the file from CRAN, there seems to be no problem.

eibanez commented 9 years ago

Great! It takes 1-2 days for the binary sources to be available. Looks like it should work now for Windows.

BenjaminWatts commented 9 years ago

It's working great now 0 many thanks!!! Really appreciated. Have a good weekend

On 23 April 2015 at 15:15, Eduardo Ibanez notifications@github.com wrote:

Great! It takes 1-2 days for the binary sources to be available. Looks like it should work now for Windows.

— Reply to this email directly or view it on GitHub https://github.com/NREL/rplexos/issues/29#issuecomment-95601096.

Kind Regards,

Ben Watts

Tel: +44 7730 611821

NREL / rplexos

Error in query interval #29