Weiming-Hu / AnalogsEnsemble

The C++ and R packages for parallel ensemble forecasts using Analog Ensemble
https://weiming-hu.github.io/AnalogsEnsemble/
MIT License
18 stars 5 forks source link

Running deep analog error on HPC... #122

Closed lovechang1986 closed 1 year ago

lovechang1986 commented 2 years ago

Hi,

First, I run the deepAnEn for 10 epochs and get the weight data file(see the zip attach). Then use the anen_netcdf to train and test the input and output data. But something error occured. Plz help me to troubleshoot, thanks very much..

To help find out more, I've packed both the numerical model predictions and the live observations into a separate Google Cloud, which you can download here if you need to. NWPdata-input_pred2.nc and OBSdata-input_obs2.nc. Also check out the shared invitation I sent to your email via Google Cloud. Thank you again.

I run this commander below. ./anen_netcdf --forecast-file input/input_pred2.nc --observation-file input/input_obs2.nc --out out.nc --test-start "2021-06-06 00:00:00" --test-end "2021-09-29 23:00:00" --search-start "2018-05-01 00:00:00" --search-end "2020-10-31 23:00:00" --ai-embedding output/embedding_epoch-00001.pt

deepAnen_run_log_and_the_pt_file.zip deepanen_run_config_file.zip

Weiming-Hu commented 2 years ago

Sorry for my late reply. While I'm check on this issue, could you run the program again with --verbose 4 and paste the log?

Weiming-Hu commented 2 years ago

I have received the input_pred2.nc file but not the input_obs2.nc file. Could you also share this file? Thanks.

Weiming-Hu commented 2 years ago

When I run the following command, I received an error message saying that the test time is not found.

> anen_netcdf --forecast-file input_pred2.nc --observation-file input_obs2.nc --out out.nc --test-start "2021-06-06 00:00:00" --test-end "2021-09-29 23:00:00" --search-start "2018-05-01 00:00:00" --search-end "2020-10-31 23:00:00" --ai-embedding embedding_epoch-00010.pt -v 4
Reading forecast file (/home/weh012/singularity_wd/input_pred2.nc) ...
entire: 1
start: 
count: 
Reading parameters ...
Reading stations ...
Reading times ...
Reading times ...
Updating dimensions for 37 parameters, 364 stations, 628 times, and 24 lead times ...
Reading variable Data ...
Reading observation file (/home/weh012/singularity_wd/input_obs2.nc) ...
entire: 1
start: 
count: 
Reading parameters ...
Reading stations ...
Reading times ...
Updating dimensions ...
Reading variable Data ...
Transforming forecasts variables to latent features with AI ...
Reading the embedding model ...
Embedding type is 1: 2-dimensional embedding [parameters, lead times]
Lead time radius for embedding: 1
Populating the tensor with 2-dimensional embeddings (parameters with lead times) ...
Feature transformation complete!36
Saving [20] latent features ...
Copying latent feature values ...
Forecast variables have been transformed to latent features!
Initialize weights to all 1s because weights in latent space do not matter!
Start AnEnIS generation ...
terminate called after throwing an instance of 'std::runtime_error'
  what():  No test indices provided. Do forecasts actually cover the test period?

I noticed that your observations are from 2017/12/31 to 2021/10/12 but your forecasts are from 2018/3/31 to 2020/10/30. Apparently, you can only generate analogs for the time period that you have forecasts and search through the period when you have both forecasts and observations. I hope this makes sense.

I changed the test period to the following.

> anen_netcdf --forecast-file input_pred2.nc --observation-file input_obs2.nc --out out.nc --test-start "2020-06-06 00:00:00" --test-end "2020-09-29 23:00:00" --search-start "2018-05-01 00:00:00" --search-end "2020-06-01 23:00:00" --ai-embedding embedding_epoch-00010.pt -v 4
Reading forecast file (/home/weh012/singularity_wd/input_pred2.nc) ...
entire: 1
start: 
count: 
Reading parameters ...
Reading stations ...
Reading times ...
Reading times ...
Updating dimensions for 37 parameters, 364 stations, 628 times, and 24 lead times ...
Reading variable Data ...
Reading observation file (/home/weh012/singularity_wd/input_obs2.nc) ...
entire: 1
start: 
count: 
Reading parameters ...
Reading stations ...
Reading times ...
Updating dimensions ...
Reading variable Data ...
Transforming forecasts variables to latent features with AI ...
Reading the embedding model ...
Embedding type is 1: 2-dimensional embedding [parameters, lead times]
Lead time radius for embedding: 1
Populating the tensor with 2-dimensional embeddings (parameters with lead times) ...
Feature transformation complete!36
Saving [20] latent features ...
Copying latent feature values ...
Forecast variables have been transformed to latent features!
Initialize weights to all 1s because weights in latent space do not matter!
Start AnEnIS generation ...
Computing standard deviation ...
Allocating memory ...
************** AnEn Configuration Summary **************
verbose: 4
num_analogs: 1
num_similarity: 1
observation_id: 0
max_par_nan: 0
max_flt_nan: 0
flt_radius: 1
save_analogs: 1
save_analogs_time_index: 0
save_similarity: 0
save_similarity_time_index: 0
operation: 0
quick: 0
prevent_search_future: 1
no_norm: 0
Use AI similarity: 0
weights: 1,1,1,1,1,... [total: 20]
standard deviation array dimensions: [20,364,24,1]
observations time index table dimensions: [448,24]
analogs arary dimensions: [364,116,24,1]
*********** End of AnEn Configuration Summary **********
************** AnEn Computation Summary **************
Number of stations: 364
Number of test times: 116
Number of lead times: 24
Number of search times: 448
Number of threads to be created: 80
*********** End of AnEn Computation Summary **********
Computing analogs ...
Progress: 100%
AnEnIS generation done!
Writing AnEn ...
anen_netcdf complete!

The program generates the following output file.

$ ncdump -h out.nc 
netcdf out {
dimensions:
        num_stations = 364 ;
        num_test_times = 116 ;
        num_flts = 24 ;
        num_analogs = 1 ;
        num_parameters = 20 ;
        num_search_times = 448 ;
variables:
        double analogs(num_analogs, num_flts, num_test_times, num_stations) ;
        double weights(num_parameters) ;
        double Xs(num_stations) ;
        double Ys(num_stations) ;
        uint64 test_times(num_test_times) ;
        uint64 search_times(num_search_times) ;
        uint64 FLTs(num_flts) ;
        string ParameterNames(num_parameters) ;

// global attributes:
                :_NCProperties = "version=1|netcdflibversion=4.6.0|hdf5libversion=1.10.0" ;
                :num_analogs = 1 ;
                :num_similarity = 1 ;
                :observation_id = 0 ;
                :max_par_nan = 0 ;
                :max_flt_nan = 0 ;
                :flt_radius = 1 ;
                :operation = 0 ;
                :quick = 0 ;
                :prevent_search_future = 1 ;
                :no_norm = 0 ;
                :Institute = "GEOlab @ Penn State" ;
                :Institute\ Link = "http://geolab.psu.edu" ;
                :Package = "Parallel Analog Ensemble" ;
                :Package\ Version = "v 4.4.4" ;
                :Package\ Link = "https://weiming-hu.github.io/AnalogsEnsemble" ;
                :Report\ Issues = "https://github.com/Weiming-Hu/AnalogsEnsemble/issues" ;

Additional to the time setting, you might also want to check your parameters in the forecast file. It might surprise you.

$ ncdump -v ParameterNames input_pred2.nc 
netcdf input_pred2 {
dimensions:
        num_parameters = 37 ;
        num_circulars = 1 ;
        num_stations = 364 ;
        num_times = 628 ;
        num_flts = 24 ;
variables:
        string ParameterNames(num_parameters) ;
        string ParameterCirculars(num_circulars) ;
        float Xs(num_stations) ;
                Xs:_FillValue = NaNf ;
        float Ys(num_stations) ;
                Ys:_FillValue = NaNf ;
        float Times(num_times) ;
                Times:_FillValue = NaNf ;
        float FLTs(num_flts) ;
                FLTs:_FillValue = NaNf ;
        float Data(num_flts, num_times, num_stations, num_parameters) ;
                Data:_FillValue = NaNf ;

// global attributes:
                :_NCProperties = "version=2,netcdf=4.7.4,hdf5=1.10.6" ;
data:

 ParameterNames = 
    "<xarray.DataArray \'ParameterNames\' ()>\narray(1., dtype=float32)\nCoordinates:\n    ParameterNames  float32 1.0\nAttributes:\n    units:    non", 
    "<xarray.DataArray \'ParameterNames\' ()>\narray(2., dtype=float32)\nCoordinates:\n    ParameterNames  float32 2.0\nAttributes:\n    units:    non", 
    "<xarray.DataArray \'ParameterNames\' ()>\narray(3., dtype=float32)\nCoordinates:\n    ParameterNames  float32 3.0\nAttributes:\n    units:    non", 
    "<xarray.DataArray \'ParameterNames\' ()>\narray(4., dtype=float32)\nCoordinates:\n    ParameterNames  float32 4.0\nAttributes:\n    units:    non", 
    "<xarray.DataArray \'ParameterNames\' ()>\narray(5., dtype=float32)\nCoordinates:\n    ParameterNames  float32 5.0\nAttributes:\n    units:    non", 
    "<xarray.DataArray \'ParameterNames\' ()>\narray(6., dtype=float32)\nCoordinates:\n    ParameterNames  float32 6.0\nAttributes:\n    units:    non", 
    "<xarray.DataArray \'ParameterNames\' ()>\narray(7., dtype=float32)\nCoordinates:\n    ParameterNames  float32 7.0\nAttributes:\n    units:    non", 
    "<xarray.DataArray \'ParameterNames\' ()>\narray(8., dtype=float32)\nCoordinates:\n    ParameterNames  float32 8.0\nAttributes:\n    units:    non", 
    "<xarray.DataArray \'ParameterNames\' ()>\narray(9., dtype=float32)\nCoordinates:\n    ParameterNames  float32 9.0\nAttributes:\n    units:    non", 
    "<xarray.DataArray \'ParameterNames\' ()>\narray(10., dtype=float32)\nCoordinates:\n    ParameterNames  float32 10.0\nAttributes:\n    units:    non", 
    "<xarray.DataArray \'ParameterNames\' ()>\narray(11., dtype=float32)\nCoordinates:\n    ParameterNames  float32 11.0\nAttributes:\n    units:    non", 
    "<xarray.DataArray \'ParameterNames\' ()>\narray(12., dtype=float32)\nCoordinates:\n    ParameterNames  float32 12.0\nAttributes:\n    units:    non", 
    "<xarray.DataArray \'ParameterNames\' ()>\narray(13., dtype=float32)\nCoordinates:\n    ParameterNames  float32 13.0\nAttributes:\n    units:    non", 
    "<xarray.DataArray \'ParameterNames\' ()>\narray(14., dtype=float32)\nCoordinates:\n    ParameterNames  float32 14.0\nAttributes:\n    units:    non", 
    "<xarray.DataArray \'ParameterNames\' ()>\narray(15., dtype=float32)\nCoordinates:\n    ParameterNames  float32 15.0\nAttributes:\n    units:    non", 
    "<xarray.DataArray \'ParameterNames\' ()>\narray(16., dtype=float32)\nCoordinates:\n    ParameterNames  float32 16.0\nAttributes:\n    units:    non", 
    "<xarray.DataArray \'ParameterNames\' ()>\narray(17., dtype=float32)\nCoordinates:\n    ParameterNames  float32 17.0\nAttributes:\n    units:    non", 
    "<xarray.DataArray \'ParameterNames\' ()>\narray(18., dtype=float32)\nCoordinates:\n    ParameterNames  float32 18.0\nAttributes:\n    units:    non", 
    "<xarray.DataArray \'ParameterNames\' ()>\narray(19., dtype=float32)\nCoordinates:\n    ParameterNames  float32 19.0\nAttributes:\n    units:    non", 
    "<xarray.DataArray \'ParameterNames\' ()>\narray(20., dtype=float32)\nCoordinates:\n    ParameterNames  float32 20.0\nAttributes:\n    units:    non", 
    "<xarray.DataArray \'ParameterNames\' ()>\narray(21., dtype=float32)\nCoordinates:\n    ParameterNames  float32 21.0\nAttributes:\n    units:    non", 
    "<xarray.DataArray \'ParameterNames\' ()>\narray(22., dtype=float32)\nCoordinates:\n    ParameterNames  float32 22.0\nAttributes:\n    units:    non", 
    "<xarray.DataArray \'ParameterNames\' ()>\narray(23., dtype=float32)\nCoordinates:\n    ParameterNames  float32 23.0\nAttributes:\n    units:    non", 
    "<xarray.DataArray \'ParameterNames\' ()>\narray(24., dtype=float32)\nCoordinates:\n    ParameterNames  float32 24.0\nAttributes:\n    units:    non", 
    "<xarray.DataArray \'ParameterNames\' ()>\narray(25., dtype=float32)\nCoordinates:\n    ParameterNames  float32 25.0\nAttributes:\n    units:    non", 
    "<xarray.DataArray \'ParameterNames\' ()>\narray(26., dtype=float32)\nCoordinates:\n    ParameterNames  float32 26.0\nAttributes:\n    units:    non", 
    "<xarray.DataArray \'ParameterNames\' ()>\narray(27., dtype=float32)\nCoordinates:\n    ParameterNames  float32 27.0\nAttributes:\n    units:    non", 
    "<xarray.DataArray \'ParameterNames\' ()>\narray(28., dtype=float32)\nCoordinates:\n    ParameterNames  float32 28.0\nAttributes:\n    units:    non", 
    "<xarray.DataArray \'ParameterNames\' ()>\narray(29., dtype=float32)\nCoordinates:\n    ParameterNames  float32 29.0\nAttributes:\n    units:    non", 
    "<xarray.DataArray \'ParameterNames\' ()>\narray(30., dtype=float32)\nCoordinates:\n    ParameterNames  float32 30.0\nAttributes:\n    units:    non", 
    "<xarray.DataArray \'ParameterNames\' ()>\narray(31., dtype=float32)\nCoordinates:\n    ParameterNames  float32 31.0\nAttributes:\n    units:    non", 
    "<xarray.DataArray \'ParameterNames\' ()>\narray(32., dtype=float32)\nCoordinates:\n    ParameterNames  float32 32.0\nAttributes:\n    units:    non", 
    "<xarray.DataArray \'ParameterNames\' ()>\narray(33., dtype=float32)\nCoordinates:\n    ParameterNames  float32 33.0\nAttributes:\n    units:    non", 
    "<xarray.DataArray \'ParameterNames\' ()>\narray(34., dtype=float32)\nCoordinates:\n    ParameterNames  float32 34.0\nAttributes:\n    units:    non", 
    "<xarray.DataArray \'ParameterNames\' ()>\narray(35., dtype=float32)\nCoordinates:\n    ParameterNames  float32 35.0\nAttributes:\n    units:    non", 
    "<xarray.DataArray \'ParameterNames\' ()>\narray(36., dtype=float32)\nCoordinates:\n    ParameterNames  float32 36.0\nAttributes:\n    units:    non", 
    "<xarray.DataArray \'ParameterNames\' ()>\narray(37., dtype=float32)\nCoordinates:\n    ParameterNames  float32 37.0\nAttributes:\n    units:    non" ;
}

And I have just made the program available through singularity/docker. Please see the pre-built images here. Checkout the one tagged with torch and everything will be ready to go.

lovechang1986 commented 2 years ago

Thank you very much. On the question of data file time, I want to make sure that the forecast file time and the real-time data time are exactly the same? As for the data files themselves, I write NC files using xarray, but for some reason using NC dump directly causes this strange problem, but it seems to come out as if the data is usable.

Weiming-Hu commented 2 years ago

Thank you very much. On the question of data file time, I want to make sure that the forecast file time and the real-time data time are exactly the same?

No. Forecast file time and the real-time data time do NOT need to be the same. The program is able to figure out the overlapping period of time and check whether your testing (e.g. search and test period) is valid. The program will also take care of aligning forecast initialization time and lead time with valid time.

As for the data files themselves, I write NC files using xarray, but for some reason using NC dump directly causes this strange problem, but it seems to come out as if the data is usable.

I noticed the format. Not sure why xarray does this. But if results come out fine, this would not be a problem.