Weiming-Hu / AnalogsEnsemble

The C++ and R packages for parallel ensemble forecasts using Analog Ensemble
https://weiming-hu.github.io/AnalogsEnsemble/
MIT License
18 stars 5 forks source link

RAnEn function generateAnalogs failed with RStudio #32

Closed Weiming-Hu closed 5 years ago

Weiming-Hu commented 5 years ago

I have the following script.

library(RAnEn)
library(maps)

# load("forecasts_ocean.RData")
# load("observations_ocean.RData")

cat("Loading data ...\n")
if ('forecasts' %in% ls()) {
  # Don't reload data
} else {
  load("forecasts_Utah.RData")
  # load("forecasts_Denver.RData")
  # load("forecasts_ocean.RData")
}

if ('observations' %in% ls()) {
  # Don't reload data
} else {
  load("observations_Utah.RData")
  # load("observations_Denver.RData")
  # load("observations_ocean.RData")
}

# Only keep the first 4 FLTs because they are perfect forecasts
if (length(forecasts$FLTs) == 53) {
  #flts.to.keep <- c(1:4)
  #flts.to.keep <- c(1:4, 7)
  flts.to.keep <- c(1:4)
  forecasts$FLTs <- forecasts$FLTs[flts.to.keep]
  forecasts$Data <- forecasts$Data[, , , flts.to.keep, drop = F]
  rm(flts.to.keep)
} else {
  cat("FLTs have already been truncated. No changes are made to the current FLTs.")
}

# Shift Xs range
if (range(forecasts$Xs)[2] > 180) {
  forecasts$Xs <- forecasts$Xs - 360
}

# Configure start and end time
test.start <- 2997
test.end <- 3027
search.end <- test.start - 2
search.start <- search.end - 364
observation_id <- 8
# weights <- c(1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1)
weights <- c(0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0)
# weights <- rep(1, length(forecasts$ParameterNames))
names(weights) <- forecasts$ParameterNames

if (T) {
  cat("Range of test times:", format(as.POSIXct(forecasts$Times[c(
    test.start, test.end)], origin = '1970-01-01', tz = 'UTC'), format = "%Y-%m-%d"),
    "\nRange of search times:", format(as.POSIXct(forecasts$Times[c(
      search.start, search.end)], origin = '1970-01-01', tz = 'UTC'), format = "%Y-%m-%d"),
    "\nPredicted variable is", observations$ParameterNames[observation_id],
    "\nweights are:\n")
  print(weights)
}

# Generate AnEn
config <- generateConfiguration('independentSearch')

config$forecasts <- forecasts$Data
config$forecast_times <- forecasts$Times
config$flts <- forecasts$FLTs
config$search_observations <- observations$Data
config$observation_times <- observations$Times
config$observation_id <- observation_id
config$weights <- weights
config$num_members <- 20
config$verbose <- 6
config$max_par_nan <- 3
config$max_flt_nan <- 1
config$quick <- F
config$circulars <- unlist(lapply(forecasts$ParameterCirculars, function (x) {
  return(which(x == forecasts$ParameterNames))}))

config$test_times_compare <- forecasts$Times[test.start:test.end]
config$search_times_compare <- forecasts$Times[search.start:search.end]

AnEn <- generateAnalogs(config)

obs <- alignObservations(observations$Data, observations$Times, forecasts$Times, forecasts$FLTs)

I get the following results when I'm running it over NAM data. The place that generates this error message is not consistent.

OpenMP is supported.
Package 'RAnEn' version 3.2.4
Copyright (c) 2018 Weiming Hu
Loading data ...
Range of test times: 2017-04-12 2017-05-12 
Range of search times: 2016-04-10 2017-04-10 
Predicted variable is SurfaceTemperature 
weights are:
    2MetreRelativeHumidity             2MetreDewpoint 
                         0                          0 
         2MetreTemperature            SoilTemperature 
                         1                          1 
             SurfaceAlbedo         1000IsobaricInhPaU 
                         0                          0 
        1000IsobaricInhPaV         SurfaceTemperature 
                         0                          1 
           SurfacePressure            TotalCloudCover 
                         1                          0 
        TotalPrecipitation DownwardShortWaveRadiation 
                         0                          0 
 DownwardLongWaveRadiation   UpwardShortWaveRadiation 
                         1                          0 
   UpwardLongWaveRadiation     1000IsobaricInhPaSpeed 
                         1                          0 
      1000IsobaricInhPaDir 
                         0 
Convert R objects to C++ objects ...
A summary of test forecast parameters:
[Parameters] size: 17
[Parameter] ID: 0, name: UNDEFINED, weight: 0, circular: 0
[Parameter] ID: 1, name: UNDEFINED, weight: 0, circular: 0
[Parameter] ID: 2, name: UNDEFINED, weight: 1, circular: 0
[Parameter] ID: 3, name: UNDEFINED, weight: 1, circular: 0
[Parameter] ID: 4, name: UNDEFINED, weight: 0, circular: 0
[Parameter] ID: 5, name: UNDEFINED, weight: 0, circular: 0
[Parameter] ID: 6, name: UNDEFINED, weight: 0, circular: 0
[Parameter] ID: 7, name: UNDEFINED, weight: 1, circular: 0
[Parameter] ID: 8, name: UNDEFINED, weight: 1, circular: 0
[Parameter] ID: 9, name: UNDEFINED, weight: 0, circular: 0
[Parameter] ID: 10, name: UNDEFINED, weight: 0, circular: 0
[Parameter] ID: 11, name: UNDEFINED, weight: 0, circular: 0
[Parameter] ID: 12, name: UNDEFINED, weight: 1, circular: 0
[Parameter] ID: 13, name: UNDEFINED, weight: 0, circular: 0
[Parameter] ID: 14, name: UNDEFINED, weight: 1, circular: 0
[Parameter] ID: 15, name: UNDEFINED, weight: 0, circular: 0
[Parameter] ID: 16, name: UNDEFINED, weight: 0, circular: 1
Computing standard deviation ... 
corrupted size vs. prev_size
Aborted (core dumped)
Weiming-Hu commented 5 years ago

If I do not truncate the FLTs or if I have fewer than 4 FLTs, RAnEn::generateAnalogs will finish without a problem. Otherwise, for example, when I have 4 FLTs in the forecasts, the function will return the error of stack overflow.

Weiming-Hu commented 5 years ago

I noticed that if I open a new R session, and source the entire file once, it should complete. When the R session is still alive, I noticed that I have many undealt R sessions. In total there are 32 R session, exatly the number of cores I have. I suspect this has something to do with OpenMP. Although I'm not familiar of how OpenMP and Rcpp work together, the extra R sessions should be terminated when they are done.

screen shot 2019-02-22 at 11 39 59 pm

If I source the R script again, the exact same script fails this time.

Weiming-Hu commented 5 years ago

Debugging the memory issue with R + Valgrind.

R -d valgrind

reference

Weiming-Hu commented 5 years ago

Looks like there is an invalid write happening somewhere in the code.

==17485== Thread 12:                                                          
==17485== Invalid write of size 8                                                       
==17485==    at 0x1060694A: Functions::computeStandardDeviation(Forecasts_array const&, StandardDeviation&, std::vector<unsigned long, std::allocator<unsigned long> >) const [clone ._omp_fn.0] (Function
s.cpp:72)                                                                                                                                                                                                 
==17485==    by 0x75E997D: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)                                                                                                                            
==17485==    by 0x78096DA: start_thread (pthread_create.c:463)                                                                                                                                            
==17485==    by 0x558688E: clone (clone.S:95)                                                                                                                                                             
==17485==  Address 0xb562240 is 0 bytes after a block of size 1,968 alloc'd                                                       
==17485==    at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==17485==    by 0x4F80133: ??? (in /usr/lib/R/lib/libR.so)
==17485==    by 0x4F80346: Rf_allocSExp (in /usr/lib/R/lib/libR.so)
==17485==    by 0x4F176DC: ??? (in /usr/lib/R/lib/libR.so)    
==17485==    by 0x4F84289: Rf_installChar (in /usr/lib/R/lib/libR.so)
==17485==    by 0x4FEEBDC: Rf_installTrChar (in /usr/lib/R/lib/libR.so)
==17485==    by 0x4F2920F: ??? (in /usr/lib/R/lib/libR.so)
==17485==    by 0x4F3EE4A: ??? (in /usr/lib/R/lib/libR.so)    
==17485==    by 0x4F4E1D7: Rf_eval (in /usr/lib/R/lib/libR.so)
==17485==    by 0x4F5066D: ??? (in /usr/lib/R/lib/libR.so)    
==17485==    by 0x4F4569D: ??? (in /usr/lib/R/lib/libR.so)
==17485==    by 0x4F4E1D7: Rf_eval (in /usr/lib/R/lib/libR.so)
==17485==  
Weiming-Hu commented 5 years ago

Bug resolved. This is caused when you are only using part of the times in search times to create analogs. A memory overflow is caused by wrong indexing. This is fixed.