ahernanzl / pyClim-SDM

Statistical Downscaling for Climate Change Projections with a Graphical User Interface
GNU General Public License v3.0
26 stars 12 forks source link

Error while running the model using my own prepared data #43

Closed wonr469 closed 1 week ago

wonr469 commented 4 months ago

The last line said that it could not broadcast input array from shape (60, 45, 43) into shape (1827, 45, 43).I am a freshman. I don't know what's happening and what should I do to make it not happen again. 屏幕截图 2024-05-07 125643

ahernanzl commented 4 months ago

Dear user, Can you check that there are no inconsistencies between the years contained at your input data and the years you have selected? You can find some indications both in the user manual and when passing the mouse over the dates. Please confirm if the problem is solved. Kind regards, Alfonso

wonr469 commented 4 months ago

Dear Alfonso, My testing hres dataset dates from 1980 to 1984. My models date from 1950 to 2014 for historical. My reanalysis data dates from 1979 to 2020. My calibration years and reference years set from 1980 to 1984. The time is short, so I choose k-fold to define how to split the calibration period for training/testing.

wonr469 commented 4 months ago

屏幕截图 2024-05-08 112451

wonr469 commented 4 months ago

NEW error: TypeError: can only concatenate str (not "list") to str

ahernanzl commented 4 months ago

Dear user, your dates selection seems consistent with your files. We have updated the software, try the new version and report if the problems remain, please. Nevertheless, I need to warn you that statistical downscaling calibrating with only 5 years is not advisable at all. For real applications at least 30 should be use, otherwise the projections will not be very reliable... Kind regards, Alfonso

wonr469 commented 3 months ago

Dear Alfonso, I have changed my calibration data to at least 30 years but new problems happen called "domain at netcdf files do not fully contain domain at hres files". Kind regards, User

wonr469 commented 3 months ago

New calibration data dates from 1980 to 2014, 35 years.

ahernanzl commented 3 months ago

Hello, Apparently you need to cover a larger domain with your netCDF files (renalysis and/or models). Is that so?

wonr469 commented 3 months ago

The domain of netCDF file is 1980-2014, the same as the calibration dataset. The spatial range is a bit larger than the calibration dataset.

wonr469 commented 3 months ago

Shall I make reanalysis netcdf file date starts at 1950 instead of 1980?

ahernanzl commented 3 months ago

No, the problem is the spatial domain. You need to cover the hres domain with your netCDFs, and with a gridbox additional border

wonr469 commented 3 months ago

Sir, my hres spatial domain covers 92.2E - 102.2E, 32.25N - 36.4N; my models and reanalysis spatial domain include 91.5E - 103E, 31N - 37N.

wonr469 commented 3 months ago

Should I extend the spatial range of models and reanalysis? In the condition that the Grid (DD) is 0.25°×0.25°

ahernanzl commented 3 months ago

Hello, apparently your spatial domains are consistent. Let's do some checks to understand what is going on. Edit the config/advanced_settings.py file, inserting the following prints around line 764: if lres_max_lat < hres_max_lat or lres_min_lat > hres_min_lat or lres_max_lon < hres_max_lon or lres_min_lon > hres_min_lon: print(lres_max_lat, lres_min_lat, lres_max_lon, lres_min_lon) print(hres_max_lat, hres_min_lat, hres_max_lon, hres_min_lon) print('Domain at netCDF files do not fully contain domain at hres files. Please check your input files.') exit()

wonr469 commented 3 months ago

Sorry to disturb, Good news. After inputting the prints you required, I can open gui_mode.py again. Then I make settings in the following graphs: 屏幕截图 2024-05-23 211642 屏幕截图 2024-05-23 211706 屏幕截图 2024-05-23 211726 屏幕截图 2024-05-23 211803 屏幕截图 2024-05-23 212250 Saf_lat_up and saf_lon_left are forced. A new error happened called " could not broadcast input array from shape (12784, 21, 45) into shape (12784, 22, 45).

ahernanzl commented 3 months ago

Hello, You are selecting a synoptic domain that dos not cover an extra gridbox to the north (37°). Try changing that, and first delete the config/settings.py file just in case. I'll be awaiting for your confirmation

wonr469 commented 3 months ago

Sir, do you mean that a smaller synoptic domain should be covered like that? 屏幕截图 2024-05-23 232358

wonr469 commented 3 months ago

Here is what happening operation0523.txt

ahernanzl commented 3 months ago

I see. There is a problem when reading the models netCDFs. Other users had the same problem because their models and the reanalysis didn't share the same grid. Is that your case? See this issue, it might help: https://github.com/ahernanzl/pyClim-SDM/issues/44

wonr469 commented 3 months ago

Sir, thanks for your guidance. I'm sorry but still have some problems make me confused.

  1. how can I get the detailed content of your input data folders, that have the same format as the #44 Questioner provide?
  2. after I add the order "print(targetVar, pred, level, model, scene)", it printed "pr pr None ACCESS-CM2_r1i1p1f1 historical"
  3. what I'm confused is that how can he make a missing_data_check and find IndexError: index 0 is out of bounds for axis 0 with size 0?
ahernanzl commented 3 months ago

Hello, edit file lib/read.py, copying lines 282-283 before the 'try' at line 281 so we can see the error. Also, edit line 142, ading a line with a print(lats) sentence, please. Are you sure your models and reanalysis share the same grid?

wonr469 commented 3 months ago
  1. if my model and reanalysis grid is different, what sentence should I add to my python files?
  2. I add my read.py sentence at line 142 about printing lres and hres. I update my python document including lib/(1)read.py, (2)MOS_lib.py, (3)WG_lib.py and (4)precontrol.py. Error still exists. This is the newest operation. Apologies for my poor English and programming abilities. operation0527.txt
ahernanzl commented 3 months ago

If your reanalysis and models grids are different there is nothing pyClim-SDM can do. You need to prepare your input_data again, remapping all netCDFs to the same grid. Are your reanalysis and models grids the same or different?

wonr469 commented 3 months ago

Confirm: if my reanalysis data grid resolution is 0.25°×0.25°, does my model data grid resolution need 0.25°×0.25° too?

ahernanzl commented 3 months ago

Yes. You need to remap all files to the same grid, downscaling models or upscaling reanalysis. cdo might help you (https://code.mpimet.mpg.de/projects/cdo)

wonr469 commented 3 months ago

Q: remap all files to the same grid means different files must have the same grid resolution, or every single grid in different files must match the same range of location in latitude and longitude?

ahernanzl commented 3 months ago

All netCDF files must contain the same gridpoints

wonr469 commented 3 months ago

I see. Thanks.

wonr469 commented 3 months ago

Dear Alfonso, cdo remap problem occurred, do you know why this happen? cdo0612.txt 屏幕截图 2024-06-12 214659

ahernanzl commented 3 months ago

Try removing the blank space after remapbil, Did it work?

wonr469 commented 3 months ago

Solve it. Thank you for answering my somewhat basic question. You've been a great help.

wonr469 commented 3 months ago

Dear Alfonso, missing_data_check... ERROR retrieving dates from netCDF models files Make sure your input_data/models directory contains the needed files. At least one direct predictors is needed, not only derived predictors. 屏幕截图 2024-06-13 180601 So what kind of change should i make for my model data?

ahernanzl commented 3 months ago

Hello, edit file lib/read.py, copying lines 282-283 before the 'try' at line 281 so we can see the error. Also, edit line 142, ading a line with a print(lats) sentence, please. Are you sure your models and reanalysis share the same grid?

wonr469 commented 3 months ago

Sir, my read.py document from line 281-292 is: try: dates = np.ndarray.tolist( read.one_direct_predictor(pred, level=level, grid='ext', model=model, scene=scene)['times']) datesDefined = True break except: pass if datesDefined == False: print('ERROR retrieving dates from netCDF models files') print('Make sure your input_data/models directory contains the needed files.') print('At least one direct predictors is needed, not only derived predictors.') exit() For line 142, it has changed to : if (lats[0] < lats[-1]) and (np.ndim(data) > 2): print(lres_max_lat, lres_min_lat, lres_max_lon, lres_min_lon) print(hres_max_lat, hres_min_lat, hres_max_lon, hres_min_lon) lats = np.flip(lats) data = np.flip(data, axis=1) Before the problem occurred, it printed: pred pr get_mean_and_std_reanalysis 37.0 31.0 103.0 91.5 36.27 32.33 102.02 92.44 37.0 31.0 103.0 91.5 36.27 32.33 102.02 92.44 spred pr get_mean_and_std_reanalysis 37.0 31.0 103.0 91.5 36.27 32.33 102.02 92.44 37.0 31.0 103.0 91.5 36.27 32.33 102.02 92.44 saf pr get_mean_and_std_reanalysis 37.0 31.0 103.0 91.5 36.27 32.33 102.02 92.44 37.0 31.0 103.0 91.5 36.27 32.33 102.02 92.44 The range of each part: hres: 92.44°E-102.02°E, 32.33°N-36.27°N; models: 91.5°E-103.25°E, 31°N-37.25°N, 0.25°×0.25° resolution; reanalysis: 90.9375°E-105.9375°E, 30.625°N-39.375°N, 0.25°×0.25° resolution with the help of cdo Operation0613.txt

ahernanzl commented 3 months ago

models: 91.5°E-103.25°E, 31°N-37.25°N, 0.25°×0.25° resolution; reanalysis: 90.9375°E-105.9375°E, 30.625°N-39.375°N, 0.25°×0.25° resolution

If I understand correctly, models and reanalysis do not share the same grid. You need to project (cdo remapbil) your models to the reanalysis grid

wonr469 commented 3 months ago

Modification: reanalysis: 91.5°E-103.25°E, 31°N-37.25°N, 0.25°×0.25° resolution; models: 90.9375°E-105.9375°E, 30.625°N-39.375°N, 0.25°×0.25° resolution Sir, the model and reanalysis grid resolution is the same. The spatial range of models is a little larger than reanalysis. Originally, the models grid resolution was 1.875°×1.25°, but i use remapbil to make models grid resolution the same as the reanalysis one.

ahernanzl commented 3 months ago

All netCDF files must contain the same gridpoints

Let's take 90.9375°E (models left border) and add 0.25° jumping from gridpoint to gridppoint. You'll never fall exactly in the reanalyisis gridpoint 91.5°E.

Is it clear this way?

wonr469 commented 3 months ago

I got it, let me think about what should i do later. Thank you for your great help.

wonr469 commented 2 months ago

Sir, new occurs keep coming. I have tried my best to to project (cdo remapbil) my models to the reanalysis grid, and add 0.25° jumping from gridpoint to gridppoint. But it still didn't work. Operation0704.txt

ValueError: could not broadcast input array from shape (12419,21,0) into shape (12784,21,43) (base) node04@node04:~/Desktop/pyClim-SDM-master/input_data/models$ cdo griddes pr_ACCESS-CM2_historical_r1i1p1f1_19500101-20141231.nc #

gridID 1

# gridtype = lonlat gridsize = 2100 xsize = 60 ysize = 35 xname = lon xlongname = "Longitude" xunits = "degrees_east" yname = lat ylongname = "Latitude" yunits = "degrees_north" xfirst = 90.9375 xinc = 0.25 yfirst = 30.625 yinc = 0.25 scanningMode = 0 cdo griddes: Processed 1 variable [8.65s 64MB] (base) node04@node04:~/Desktop/pyClim-SDM-master/input_data/models$ cdo griddes pr_ACCESS-CM2_ssp245_r1i1p1f1_20150101-21001231.nc #

gridID 1

# gridtype = lonlat gridsize = 2100 xsize = 60 ysize = 35 xname = lon xlongname = "Longitude" xunits = "degrees_east" yname = lat ylongname = "Latitude" yunits = "degrees_north" xfirst = 90.9375 xinc = 0.25 yfirst = 30.625 yinc = 0.25 scanningMode = 0 cdo griddes: Processed 1 variable [92.41s 70MB] (base) node04@node04:~/Desktop/pyClim-SDM-master/input_data/models$ cd /home/node04/Desktop/pyClim-SDM-master/input_data/reanalysis (base) node04@node04:~/Desktop/pyClim-SDM-master/input_data/reanalysis$ cdo griddes tp_ERA5_19800101-20141231.nc #

gridID 1

# gridtype = lonlat gridsize = 2079 datatype = float xsize = 63 ysize = 33 xname = longitude xlongname = "longitude" xunits = "degrees_east" yname = latitude ylongname = "latitude" yunits = "degrees_north" xfirst = 90.687 xinc = 0.2500162 yfirst = 39.625 yinc = -0.25 scanningMode = 0 cdo griddes: Processed 1 variable [1.45s 50MB]

ahernanzl commented 2 months ago

Hello, I see that the reanalysis and models still don't have the same grid. If you have problems with cdo there are decicated forums, or maybe you know any other tool to do the task.

wonr469 commented 2 months ago

Progress updated. I checked the attribute of models and reanalysis documents. Here it is. `Traceback (most recent call last): File "/home/node04/Desktop/pyClim-SDM-master/src/.tmp_main.py", line 18, in main() File "/home/node04/Desktop/pyClim-SDM-master/src/.tmp_main.py", line 9, in main preprocess.preprocess() File "/home/node04/Desktop/pyClim-SDM-master/src/../lib/preprocess.py", line 43, in preprocess common() File "/home/node04/Desktop/pyClim-SDM-master/src/../lib/preprocess.py", line 68, in common standardization.get_mean_and_std_reanalysis(targetVar, fields_and_grid) File "/home/node04/Desktop/pyClim-SDM-master/src/../lib/standardization.py", line 67, in get_mean_and_std_reanalysis data = read.lres_data(targetVar, field=field, grid=grid)['data'] File "/home/node04/Desktop/pyClim-SDM-master/src/../lib/read.py", line 358, in lres_data data[i] = one_direct_predictor('pr', level=None, grid='ext', model=model, scene=scene)['data'][idates]; ValueError: could not broadcast input array from shape (12784,21,0) into shape (12784,21,43)

import netCDF4 as nc fn = 'pr_ACCESS-CM2_historical_r1i1p1f1_19500101-20141231.nc' ds = nc.Dataset(fn) print(ds) <class 'netCDF4._netCDF4.Dataset'> root group (NETCDF4 data model, file format HDF5): CDI: Climate Data Interface version 2.2.3 (https://mpimet.mpg.de/cdi) Conventions: CF-1.7 CMIP-6.2 source: ACCESS-CM2 (2019): aerosol: UKCA-GLOMAP-mode atmos: MetUM-HadGEM3-GA7.1 (N96; 192 x 144 longitude/latitude; 85 levels; top level 85 km) atmosChem: none land: CABLE2.5 landIce: none ocean: ACCESS-OM2 (GFDL-MOM5, tripolar primarily 1deg; 360 x 300 longitude/latitude; 50 levels; top grid cell 0-10 m) ocnBgchem: none seaIce: CICE5.1.2 (same grid as ocean) institution: CSIRO (Commonwealth Scientific and Industrial Research Organisation, Aspendale, Victoria 3195, Australia), ARCCSS (Australian Research Council Centre of Excellence for Climate System Science) activity_id: CMIP branch_method: standard branch_time_in_child: 0.0 branch_time_in_parent: 0.0 creation_date: 2019-11-09T02:20:30Z data_specs_version: 01.00.30 experiment: all-forcing simulation of the recent past experiment_id: historical external_variables: areacella forcing_index: 1 frequency: day further_info_url: https://furtherinfo.es-doc.org/CMIP6.CSIRO-ARCCSS.ACCESS-CM2.historical.none.r1i1p1f1 grid: native atmosphere N96 grid (144x192 latxlon) grid_label: gn history: Thu Jun 13 10:26:34 2024: cdo remapbil,docu01.txt input.nc output.nc 2019-11-09T02:20:30Z ; CMOR rewrote data to be consistent with CMIP6, CF-1.7 CMIP-6.2 and CF standards. initialization_index: 1 institution_id: CSIRO-ARCCSS mip_era: CMIP6 nominal_resolution: 250 km notes: Exp: CM2-historical; Local ID: bj594; Variable: pr (['fld_s05i216']) parent_activity_id: CMIP parent_experiment_id: piControl parent_mip_era: CMIP6 parent_source_id: ACCESS-CM2 parent_time_units: days since 0950-01-01 parent_variant_label: r1i1p1f1 physics_index: 1 product: model-output realization_index: 1 realm: atmos run_variant: forcing: GHG, Oz, SA, Sl, Vl, BC, OC, (GHG = CO2, N2O, CH4, CFC11, CFC12, CFC113, HCFC22, HFC125, HFC134a) source_id: ACCESS-CM2 source_type: AOGCM sub_experiment: none sub_experiment_id: none table_id: day table_info: Creation Date:(30 April 2019) MD5:e14f55f257cceafb2523e41244962371 title: ACCESS-CM2 output prepared for CMIP6 variable_id: pr variant_label: r1i1p1f1 version: v20191108 cmor_version: 3.4.0 tracking_id: hdl:21.14100/27a2a033-bc7e-45ff-8c30-02bf65722aaf license: CMIP6 model data produced by CSIRO is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License (https://creativecommons.org/licenses/). Consult https://pcmdi.llnl.gov/CMIP6/TermsOfUse for terms of use governing CMIP6 output, including citation requirements and proper acknowledgment. Further information about this data, including some limitations, can be found via the further_info_url (recorded as a global attribute in this file). The data producers and data providers make no warranty, either express or implied, including, but not limited to, warranties of merchantability and fitness for a particular purpose. All liabilities arising from the supply of the information (including any liability arising in negligence) are excluded to the fullest extent permitted by law. CDO: Climate Data Operators version 2.2.1 (https://mpimet.mpg.de/cdo) dimensions(sizes): time(23741), bnds(2), lon(60), lat(35) variables(dimensions): float64 time(time), float64 time_bnds(time, bnds), float64 lon(lon), float64 lat(lat), float32 pr(time, lat, lon) groups: fn = '/home/node04/Desktop/pyClim-SDM-master/input_data/reanalysis/tp_ERA5_19800101-20141231.nc' ds = nc.Dataset(fn) print(ds) <class 'netCDF4._netCDF4.Dataset'> root group (NETCDF4 data model, file format HDF5): CDI: Climate Data Interface version 2.2.3 (https://mpimet.mpg.de/cdi) Conventions: CF-1.6 history: Sun Jul 07 23:54:05 2024: cdo daysum mertp1980.nc tpsum1980.nc 2024-07-04 09:20:45 GMT by grib_to_netcdf-2.28.1: /opt/ecmwf/mars-client/bin/grib_to_netcdf -S param -o /cache/data4/adaptor.mars.internal-1720084818.869458-19807-1-8b2b23f7-84df-464d-ba58-373bf65a694e.nc /cache/tmp/8b2b23f7-84df-464d-ba58-373bf65a694e-adaptor.mars.internal-1720084168.613236-19807-1-tmp.grib frequency: day CDO: Climate Data Operators version 2.2.1 (https://mpimet.mpg.de/cdo) dimensions(sizes): time(12784), bnds(2), longitude(63), latitude(39) variables(dimensions): int32 time(time), int64 time_bnds(time, bnds), float32 longitude(longitude), float32 latitude(latitude), float32 tp(time, latitude, longitude)

` Is there anyone knows that how to make these documents have the same array shape. Comments are warmly welcomed.

ahernanzl commented 1 month ago

Dear user, you can use CDO (https://code.mpimet.mpg.de/projects/cdo) to remap your files so both reanalysis and models share the exact same grid. Let's say you want to remap your models to the reanalysis grid. Then you just have to use the following CDO command: cdo remapbil,reanalysis_file.nc model_file.nc out.nc

Be aware that there is no blank space after 'remapbil,' Hope it helps!

wonr469 commented 4 weeks ago

There is a question. When I download models document, its longitude unit looks like: 90.9375E, 92.8125E, ..., 104.0625E, 105.9375E. (1.875° interval) reanalysis longitude unit looks like: 89.062E, 90.9371E, 92.8122E, ..., 104.0628E, 105.9379E, 107.813E, 109.6881E. (1.8751° interval) If there is only 0.0001°interval error, is your PyClim software can accept it and operate smoothly?

wonr469 commented 1 week ago

ERROR retrieving dates from netCDF reanalysis files At least one direct predictors is needed, not only derived predictors. Is that mean there are at least two documents in different types needed in the reanalysis file ?

ahernanzl commented 1 week ago

What predictors are you selecting?

wonr469 commented 1 week ago

pr. You mean I should select more than one predictor?

wonr469 commented 1 week ago

btw, respectful Mr Alfonso, may I know your email address so that I can reponse, and ask question more promptly?

ahernanzl commented 1 week ago

That error comes from lines 302-328 of lib/read.py. Those lines are meant to retrieve dates from netCDF reanalysis files, but something is going wrong. In order to find out what it is, you will need to modify the code and see what happens. Please copy this two lines of code after line 303:

_aux_times = one_direct_predictor(var, grid='ext', model=model, scene=scene)['times'] print(auxtimes)

and select only precipitation as target variable and as only predictor. Then tell us what you get. And about the email, I rather keep communicating as issues, so it can help other users and because I am not the only one answering

Kind regards

wonr469 commented 1 week ago

Respectful Alfonso, I have copied this two lines of code you mentioned after line 303. Here is the response. Traceback (most recent call last): File "/home/node04/Desktop/pyClim-SDM-master/src/.tmp_main.py", line 18, in &lt;module&gt; main() File "/home/node04/Desktop/pyClim-SDM-master/src/.tmp_main.py", line 9, in main preprocess.preprocess() File "/home/node04/Desktop/pyClim-SDM-master/src/../lib/preprocess.py", line 43, in preprocess common() File "/home/node04/Desktop/pyClim-SDM-master/src/../lib/preprocess.py", line 68, in common standardization.get_mean_and_std_reanalysis(targetVar, fields_and_grid) File "/home/node04/Desktop/pyClim-SDM-master/src/../lib/standardization.py", line 67, in get_mean_and_std_reanalysis data = read.lres_data(targetVar, field=field, grid=grid)['data'] File "/home/node04/Desktop/pyClim-SDM-master/src/../lib/read.py", line 305, in lres_data aux_times = one_direct_predictor(var, grid='ext', model=model, scene=scene)['times'] File "/home/node04/Desktop/pyClim-SDM-master/src/../lib/read.py", line 201, in one_direct_predictor nc = netCDF(pathIn, filename, ncVar, grid=grid, level=level) File "/home/node04/Desktop/pyClim-SDM-master/src/../lib/read.py", line 81, in netCDF calendar = nc.variables[time_name].calendar KeyError: 'time'

ahernanzl commented 1 week ago

Ok, for some reason the netCDF reanalysis file hasn't got a time variable named 'time'. Let's see if the time variable is named with any other name. For this purpose, edit the same file, uncommenting lines 55-60 and show me what happens