NREL / rplexos

Other
18 stars 13 forks source link

[Feature] summary function #20

Closed danielsjf closed 9 years ago

danielsjf commented 9 years ago

I made an R script that given a Plexos output generates some standard graphs. However currently I insert some data manually for each solution such as:

I could do a general query instead, but this is dangerous since querying a large db at once can cause a memory overload. Therefore I was looking for a function that uses the RPlexos DB as an argument and gives a summary like this (print statement). Furthermore an additional function could have arguments such as phase and item (being begin date, end date, country, generator categories,...) and would yield the desired result in an vector format.

The idea would be that you could only query the first or last line. Or use some kind of an SQL unique() function for the countries.

Additionally, you could expand the summary functions to also work on native Plexos solution DBs. One of the lacking points of Plexos is that it is hard to find out what is in their solution without effectively opening them (or having a glance at the log file which is also not very convenient).

Again these are just some blue sky features but they would be really useful! I know from experience that it is a lot of work to develop a code like this and I also know how valuable time can be :-) Thanks for all the effort!

eibanez commented 9 years ago

I'm afraid I'm not going to be able to help you much here. Useful summaries are very case and user dependent. For instance, standard generation graphs are interesting, but some people like to output "Generation" and other "Net Generation".

If you want to develop something, I'd be happy to review and incorporate, but I don't have the time to do this in the foreseeable future.

There is, however, an internal function that might be useful to you. It can be used to send a SQLite query to all the database loaded with plexos_open. Right now is an internal function and can be used as follows. I can expose this function so it accessible like the other query functions. Let me know if you'd like me to do that.

rplexos:::query_scenario(db, query)

I would recommend opening one of the solution DB's and looking at the structure. The key table has the information of all the objects and properties. There are also views for the summary tables and the interval data. You can probably write very quick queries that give you the information you want in a quick way (filter the rows you want, select the needed columns, use unique/min/max).

danielsjf commented 9 years ago

Hi eibanez,

I actually meant functions such as getSamples(db), getRegions(db), getStepSize(db), getStartDate(db,phase), getEndDate(db,phase), etc.

I started to implement a very rough function myself. I tried to find queries that were as 'cheap' as possible in terms of DB queries, but maybe it is even better to do it once when the DB is generated and do then queries to a new table in the db called for example summary.

Below is an example of the summary function:

plexos_summary <- function(folder){
  # Remove trailing '/' from folder
  folder <- sub('/?$','',folder)

  assert_that(is.character(type))
  assert_that(is.dir(folder))

  # set folder
  setwd(folder)

  # open plexos db
  db <- plexos_open()

  x2 <- rplexos:::get_table_scenario(db, "log_steps")

  # initialise time data frame
  time <- data.frame(From=as.POSIXct(array('1900-01-01 00:00:00', dim=c(4))), To=as.POSIXct(array('1900-01-01 00:00:00', dim=c(4))))
  used <- array(0,dim=c(4))

  # make db connection
  if(length(db$filename) > 1L){
    stop('More than one db file in folder')
  }
  thesql <- src_sqlite(db$filename)

  # loop over all the phases
  for(phase_x in 1:4){

    # check if properties available for this phase
    if(dim(filter(data.frame(query_property(db)),phase_id==phase_x))[1]>0L){
      used[phase_x] <- 1
      # if any properties for this phase

      # get list of property names for this phase
      t.name <- tbl(thesql, "property") %>% filter(phase_id == phase_x, collection == 'Generator', is_summary == 0) %>% collect
      the.table.name <- gsub("data_interval_", "", t.name$table_name)

      # Get first line of first property
      firstline <- head(filter(tbl(thesql,the.table.name[1]),phase_id == phase_x),1)
      prop <- firstline$property
      name_x <- firstline$name

      # retreive start date using the first line of the first property
      time[phase_x,1] <- as.POSIXct(firstline$time_from, format="%Y-%m-%d %H:%M:%S")
      phases <- c('LT Schedule', 'PASA', 'MT Schedule', 'ST Schedule')

      # retreive steps
      steps <- head(filter(x2, phase == phases[phase_x])$total_step,1)

      # retreive end date by invers sort on time and take last element (expensive operation but no alternative found)
      time[phase_x,2] <- as.POSIXct(head(arrange(filter(tbl(thesql,the.table.name[1]),phase_id == phase_x, property == prop, name == name_x),desc(time_to)),1)$time_to, format="%Y-%m-%d %H:%M:%S")

      # calculate step size
      step_size[i] <- (as.numeric(time[phase_x,2]) - as.numeric(time[phase_x,1]))/3600/steps

    }else{
      # if no properties: no start and end date
      used[phase_x] <- 0
    }
  }
  # Retreive samples
  samples <- data.frame(tbl(thesql, 'config')) %>%
    filter(element=='Sim Samples') %>%
    select(value)
  samples <- as.integer(samples)

  # Retreive countries
  if(dim(filter(query_property(db),is_summary == 0,property=='Load'))[1]>0L){
    phase_y <- filter(query_property(db),is_summary == 0,property=='Load')$phase_id[1]
    Countries <- query_interval(db,"Region","Load",phase=phase_y,c("name"),time.range=c(as.character(time[phase_x,1]),as.character(time[phase_x,1]+1))) %>% select(name)
    Countries <- Countries$name
  }else{
    stop("report setting load not defined and required for this")
  }  
  list(Countries=Countries, Samples=samples,Time=time, Steps=step_size)
}

Is this something useful to add to the package or do I better implement it myself?

eibanez commented 9 years ago

The next version will have a series of functions to query some of the information that you are looking for.

The will be called: query_time, query_band, query_phase, query_sample, query_timeslice, query_class, query_class_member, query_generator, query_region, query_zone.

If there are other queries that you need you can use these as an example. You can send me a pull request if any of them could be of general use.

I am not planning to incorporate a function like the one you suggested. It would be hard to make it useful for general use. Plus, it should be really easy to rewrite it with the new functions.