NREL / rplexos

Other
18 stars 13 forks source link

sum_year() queries are one year off when using time.range #31

Closed danielsjf closed 9 years ago

danielsjf commented 9 years ago

Hi,

Today I tried to do some yearly sums, when I noticed that sum_year is one year off. What is in my solution file:

I first queried without time.range, hoping that I would have some column: 'year' on which I could then subquery, but it appeared that there was no such column (it was the sum over the two years). Then I tried the time.range on the first year, but I got the results of the second year (consisting of only one day). Then I tried to be smart and queried on the year before but that ended in an empty query. Finally, I decided to do a query_interval and sum myself, but this is a lot slower.

In sumyear, the total sum over the two years is correct, but I think something is off in the querying using dates. And another useful feature would be to have a column 'year' saying 2014 or 2015 (and similar columns for all the other sum functions) in the results, the same as in the Plexos solution viewer.

eibanez commented 9 years ago

Can you edit and add the actual queries that you are describing?

I think you are misunderstanding the use of the sum_* functions. To get results like in the PLEXOS GUI, use the query_year function.

sum_year only uses the year table. The results should be the same you get if you use query_year and then sum the results. To differentiate by year you have to add "time" to the columns argument in any sum_ function.

danielsjf commented 9 years ago

I tried to do both a query year and simultaneously sum over the categories. I looked more into depth and the results are at least peculiar. But I also think there is something wrong in the Plexos aggregation functions.

This is the code I ran:

# Query_interval and manually select dates
a <- query_interval(db, 
                    "Generator", 
                    c("Units Started"), 
                    c("name","category","region","sample"), 
                    phase=4, 
                    filter=list(category=technologies, region=countries))
start <- as.POSIXct('2030-01-01 00:00:00')
end <- as.POSIXct('2030-12-31 23:00:00')
a <- a %>%
  filter(between(time,start,end),
         property=='Units Started',
         category %in% technologies,
         region %in% countries) %>%
  select(region,sample,category,time,name,property,value) %>%
  group_by(region, sample, category, property) %>%
  summarise(value = sum(value)) %>%
  rename(a=value)
# Sum_year and select dates in query
b <- sum_year(db, 
              "Generator", 
              c("Units Started"), 
              c("category","region","sample"), 
              phase=4, 
              time.range=c('2030-01-01 01:00:00','2030-12-31 24:00:00'), 
              filter=list(category=technologies, region=countries)) %>% 
  ungroup() %>% 
  select(-scenario, -collection) %>%
  rename(b=value)
# Sum_year and don't select dates
c <- sum_year(db, 
              "Generator", 
              c("Units Started"), 
              c("category","region","sample"), 
              phase=4, 
              filter=list(category=technologies, region=countries)) %>% 
  ungroup() %>% 
  select(-scenario, -collection) %>%
  rename(c=value)
# Query year and don't select dates and manually sum over categories
d <- query_year(db, 
                "Generator", 
                c("Units Started"), 
                c("category","region","sample"), 
                phase=4, 
                filter=list(category=technologies, region=countries)) %>%
  group_by(category, region, sample, property) %>%
  summarise(value=sum(value)) %>%
  rename(d=value)
# Query_year and don't select dates but include column time so that he sorts per year
e <- query_year(db, 
                "Generator", 
                c("Units Started"), 
                c("category","region","sample",'time'), 
                phase=4, 
                filter=list(category=technologies, region=countries)) %>%
  group_by(category, region, sample, property,time) %>%
  summarise(value=sum(value)) %>%
  dcast(category + region + sample + property ~ time)
# Sum_year and don't select dates but include column time so that he sorts per year
f <- sum_year(db, 
              "Generator", 
              c("Units Started"), 
              c("category","region","sample","time"), 
              phase=4, 
              filter=list(category=technologies, region=countries)) %>% 
  ungroup() %>% 
  select(-scenario, -collection) %>%
  rename(f=value)

total <- a %>%
  left_join(b, by=c('region','sample','category','property')) %>%
  left_join(c, by=c('region','sample','category','property')) %>%
  left_join(d, by=c('region','sample','category','property')) %>%
  left_join(e, by=c('region','sample','category','property'))

And this is the result: image

Column a has exactly the results I am looking for. These are different than the results from Plexos (column e1 and e2), while I should expect that they would be the same as column e1. But e1 and e2 are the same as the Plexos yearly results. For region 4 for example: image

This makes me think that the yearly Plexos aggregation is not working well. I checked it in the Plexos solution viewer and the last day (1th of January 2031) does not account for all the startups given in the yearly results. However, the sum over December 2030 plus the one day in January 2031 do account for the yearly results. I will contact Plexos for comments on this.

But then if I do query b, I would expect to find e1, but instead I find e2. Query c and d give the normal expected result since they sum over all time periods and are correctly the sum of e1 and e2.

And for f finally, I would expect the same result as from e, since I included the column time to sum on.

eibanez commented 9 years ago

The fact that c through f are consistent shows that the sum functions in rplexos are working correctly.

I think this might be an issue of how you are applying the time filters (e.g., '2030-12-31 24:00:00' evaluates in R to the first hour in 2031). PLEXOS might have inconsistencies in the way the summary tables are created. When in doubt, use query_interval or sum_interval.

I'm closing this, given that is not a bug.

eibanez commented 9 years ago

I think I found a possible error with the sum functions and will be fixed in the next release. I also check and the time filter works as intended now.