Multiple run, visualisation, exportation

SARRA-cropmodels / SARRA-Py

GNU General Public License v3.0

5 stars 1 forks source link

Multiple run, visualisation, exportation #6

Open ambenguepy opened 1 week ago

ambenguepy commented 1 week ago

Hello, I use Sarra-py to do a simulation over several years from 2010 to 2016 with standard settings. My simulation works and I manage to visualize the map that I put here in PJ. Only in the file that I export it is in 3 dimension x, y and time = 275. I guess this is the final output. But I want to have the yield for each year from 2010 to 2016 and I have the impression that it is an average yield that is stored. Need help to visualize the yield of these 7 years on a panel and to export this data.

Sincerely

Mon script for year in range(2010,2016):

# we pass the year to date_start
date_start = datetime.date(year,4,1)
duration = 365-(date_start-datetime.date(date_start.year,1,1)).days

# we tap into the climate data that we already have retrieved, given that this folder has all the necessary data
#rainfall_data_path = "/mnt/g/Mon Drive/CIRAD/thèse Asse Mbengue/donnees_climato_pluvio_preparees/Rainfall/"
#climate_data_path = "/mnt/g/Mon Drive/CIRAD/thèse Asse Mbengue/donnees_climato_pluvio_preparees/Ex_climat_2016_SN/"
rainfall_data_path = "/home/assembengue/SARRA-Py/data/donnees_climato_pluvio_preparees_ok/donnees_climato_pluvio_preparees/Rainfall/"
climate_data_path = "/home/assembengue/SARRA-Py/data/donnees_climato_pluvio_preparees_ok/donnees_climato_pluvio_preparees/Ex_climat_2016_SN/"

grid_width, grid_height = get_grid_size(rainfall_data_path, date_start, duration)
print("The grid is {} pixels wide by {} pixels high.".format(grid_width, grid_height))

# initialize empty xarray dataset
base_data = xr.Dataset()

# load rainfall and weather data
base_data = load_TAMSAT_data(base_data, rainfall_data_path, date_start, duration)
base_data = load_AgERA5_data(base_data, climate_data_path, date_start, duration)

# load soil parameters
base_data = load_iSDA_soil_data_alternate(base_data, grid_width, grid_height)

# compute day length raster
base_data = calc_day_length_raster_fast(base_data, date_start, duration)

# parameter file names
file_paramVariete = "rice_sativa_F95_variety.yaml"
#file_paramITK = "rice_senegal_2016.yaml"
file_paramITK = "rice_senegal.yaml"
file_paramTypeSol = "USA_iowa_V42.yaml" # default, will be overwritten by soil water holding capacity map

# load variety, cropping system and soil parameters
paramVariete, paramITK, paramTypeSol = load_YAML_parameters(file_paramVariete, file_paramITK, file_paramTypeSol)

# we force the correction of the sowing date
paramITK["DateSemis"] = datetime.date(year,5,1)

# creato,g simulation xarray dataset by copying the base data
data = base_data.copy()

# initializing all the necessary variables
data = initialize_simulation(data, grid_width, grid_height, duration, paramVariete, paramITK, date_start)
data = initialize_default_irrigation(data)
data = calculate_once_daily_thermal_time(data, paramVariete)

![Capture d’écran du 2024-06-17 15-54-01](https://github.com/SARRA-cropmodels/SARRA-Py/assets/173065082/c0a251fe-7d6e-40b8-8768-d85145a4fd3f)    #! /!\ if you want to keep the results, you should export them, i.e. by doing :
#! /!\ data.to_netcdf("results_{}.nc".format(year))

codename5281 commented 1 week ago

Hello @ambenguepy ,

Thanks for reaching out.

The sequence of instructions contained in the provided script aims:

first, at loading a base simulation dataset : loading rainfall data via load_TAMSAT_data(), climatic data via load_AgERA5_data(), soil parameters via load_iSDA_soil_data_alternate(), calculating day length via calc_day_length_raster_fast(), and loading simulation parameters via load_YAML_parameters()
second, at initializing the simulation dataset (via initialize_simulation(), initialize_default_irrigation(), and calculate_once_daily_thermal_time())

However, there is no run_model() line, hence the simulation itself will not be run.

Could you first confirm that you effectively launched the simulation ?

Thanks, J.

ambenguepy commented 1 week ago

yes i run model here :+1:

data = run_model(paramVariete, paramITK, paramTypeSol, data, duration)
    # Exporting the 'rdt' variable to a NetCDF file
    data['rdt'].to_netcdf("results_rdt_{}.nc".format(year))
    #! /!\ warning : here the results are not exported whatsoever, which means the data DataArray will be overwritten at each loop iteration
    #! /!\ if you want to keep the results, you should export them, i.e. by doing :
    #! /!\ data.to_netcdf("results_{}.nc".format(year))

&nd the result is on .png. But is not yiels for all years (7) Capture d’écran du 2024-06-17 15-54-01

codename5281 commented 1 week ago

Hello again @ambenguepy ,

From what we discussed in person, let me remind you that your data object stores all the variables and results for the considered year.

In your case, by passing data['rdt'].to_netcdf("results_rdt_{}.nc".format(year)), you save the whole 3D (dimensions = (time, x, y)) yield matrix (rdt = rendement = yield) containing the yield simulation for the whole year.

Hence, if you want to run the simulation for multiple years, and saving only the final yield, your code should look more like this:

for year in range(2016,2019):

    date_start = datetime.date(year,4,1)
    duration = 365-(date_start-datetime.date(date_start.year,1,1)).days

    rainfall_data_path = "/path/to/rainfall/data"
    climate_data_path = "/path/to/climate/data/"

    grid_width, grid_height = get_grid_size(rainfall_data_path, date_start, duration)
    print("The grid is {} pixels wide by {} pixels high.".format(grid_width, grid_height))

    base_data = xr.Dataset()
    base_data = load_TAMSAT_data(base_data, rainfall_data_path, date_start, duration)
    base_data = load_AgERA5_data(base_data, climate_data_path, date_start, duration)
    base_data = load_iSDA_soil_data_alternate(base_data, grid_width, grid_height)
    base_data = calc_day_length_raster_fast(base_data, date_start, duration)

    file_paramVariete = "rice_sativa_F95_variety.yaml"
    file_paramITK = "rice_senegal_2016.yaml"
    file_paramTypeSol = "USA_iowa_V42.yaml"
    paramVariete, paramITK, paramTypeSol = load_YAML_parameters(file_paramVariete, file_paramITK, file_paramTypeSol)

    paramITK["DateSemis"] = datetime.date(year,5,1)

    data = base_data.copy()
    data = initialize_simulation(data, grid_width, grid_height, duration, paramVariete, paramITK, date_start)
    data = initialize_default_irrigation(data)
    data = calculate_once_daily_thermal_time(data, paramVariete)

    data = run_model(paramVariete, paramITK, paramTypeSol, data, duration)

    data["rdt"][-1,:,:].to_netcdf("yield_end_sim_{}.nc".format(year))

This should generate a series of simulated yields for years 2016 to 2018, and you will be able to load them back in a xarray DataArray by performing something like :

# list .nc files in the results folder
list_files = os.listdir("/path/to/results/")

# initialize empty xarray dataset
results_data = xr.Dataset()

# loop over the files
for file in list_files:
    data = xr.open_dataset("/path/to/results/" + file)
    results_data = xr.concat([results_data, data], dim="time")

Can you try that ? Let me know. Best, J.

ambenguepy commented 1 week ago

i try it but only one file is generate. And if i likle to see dimensions of this file results_data.time the array is note the year xarray.DataArray'time'time: 2 array([0, 1]) Coordinates: spatial_ref () int64 ... Indexes: (0) Attributes: (0)

codename5281 commented 1 week ago

Is the script throwing some errors during execution of the loop ? If yes, which ones ? Are you sure only one file is generated ? Did you try to refresh your file explorer ?

As for the year not being displayed as dimensions, it is expected as when loading the generated results files with the proposed method, only the index of the slice along the time dimension is used. So if your time series is 1996 ->1999 your results_data[0,:,:] will be 1996, results_data[1,:,:] will be 1997 and so on. You can still add vector metadata afterwards but ensure everything is working fine first.

ambenguepy commented 1 week ago

yes i have an error when i load Ageraclimate data . the error is :

ValueError                                Traceback (most recent call last)
Cell In[21], line 15
     13 base_data = xr.Dataset()
     14 base_data = load_TAMSAT_data(base_data, rainfall_data_path, date_start, duration)
---> 15 base_data = load_AgERA5_data(base_data, climate_data_path, date_start, duration)
     16 base_data = load_iSDA_soil_data_alternate(base_data, grid_width, grid_height)
     17 base_data = calc_day_length_raster_fast(base_data, date_start, duration)

[edit : error message truncated for clarity]

ValueError: cannot reindex or align along dimension 'time' because of conflicting dimension sizes: {274, 275}

# list .nc files in the results folder
list_files = o

codename5281 commented 1 week ago

---> 15 base_data = load_AgERA5_data(base_data, climate_data_path, date_start, duration)

This means you have an error being thrown when loading your climatic data. You should add a line at the beginning of the loop printing the year the simulation is being performed on, so that when you have an error you can go check in your climate file what is wrong.

To debug, you can also run the simulation manually one year at a time to check which year is faulty.

Just to check, do you have the data from all years in the same folder ?

ambenguepy commented 1 week ago

yes all rainfall data is in same folder and climate data in same other folder

codename5281 commented 1 week ago

Still, this indicates there is an issue with your climate data. Please proceed manually year by year to identify which year is causing the issue, independently from the loop.

codename5281 commented 1 week ago

For the record, @ambenguepy contacted me directly, stating the following :

When I simulate for 2016, I get a consistent result, but for the other years, the yield is zero everywhere, which surprises me because all my data are of the same order of magnitude. I am providing the download link for 4 years of data so that if you have time, you can run the simulation for these 4 years to see if you get a result on your side. I am also including the notebook for the 2016 simulation, which worked well. Since I need to validate the calibration with my 10 years of observed yields first, I can run 10 simulations. However, with the projection data, we will need to run simulations from 1983 to 2100 (multiple runs).

The user provided rainfall and climate data for the period 2011 to 2015 included, however without his notebook.

As suggested, i tried to run simulations from his data:

2011 thrown the following error:


0%|          | 0/5 [00:00<?, ?it/s]
2m_temperature_24_hour_maximum
2m_temperature_24_hour_mean
20%|██        | 1/5 [00:11<00:45, 11.29s/it]

ValueError Traceback (most recent call last) Input In [13], in <cell line: 6>() 5 base_data = load_TAMSAT_data(base_data, rainfall_data_path, date_start, duration) ----> 6 base_data = load_AgERA5_data(base_data, climate_data_path, date_start, duration) 8 # load soil parameters [...] ValueError: cannot reindex or align along dimension 'time' because of conflicting dimension sizes: {274, 275}


This error means there is a discrepency in between the time dimension of `base_data` xarray, and the dimension of the dataset being loaded. As the time dimension is initially set using the number of daily rainfall files for the considered year, we first check for missing files in our variables folders. After checking, it appears that there are 364 rainfall files for 2011, and 365 climate files. Hence, the user should correct its rainfall files and add the one missing date.

- 2012 to 2015 ran without issue

The encountered error was then expected. Under the current version of SARRA-Py, that is not explicit about the cause of encountered errors, users should be careful when preparing their datasets, ensuring good naming convention of both folders and files, and making sure no data is missing.

ambenguepy commented 1 week ago

Hello, yes I replaced the missing data for 2011. and I no longer have any error messages. Still, for 2012 to 2015 which worked for you, the results are not consistent. I say without a doubt that it is as you said in private. So how do I update the sowing date because at my home I just kept all the parameters of the 2016 simulation and in 2015 in the definition of the simulation dates I modified as follows: date_start = datetime.date(2015,4,1) duration = 365-(date_start-datetime.date(date_start.year,1,1)).days

codename5281 commented 1 week ago

Hello @ambenguepy ,

As suggested in the loop calculation code example from this previous answer, you do want in your calculation loop to update the dates of both date_start and paramITK["DateSemis"] parameters so the sowing date is updated in accordance with the year you are performing your simulation for. Basically, you will have to add a line after the load_YAML_parameters() call, that could be as follows :

for year in range(2016,2019):

    date_start = datetime.date(year,4,1)
    duration = 365-(date_start-datetime.date(date_start.year,1,1)).days

    [...]

    paramVariete, paramITK, paramTypeSol = load_YAML_parameters(file_paramVariete, file_paramITK, file_paramTypeSol)

    # updates the sowing date by replacing the year from the yaml config file with the current year in the loop
    paramITK["DateSemis"] = datetime.date(year,5,1) 

    [...]

As you can see, dates are datetime.date objects which are easy to manipulate so you can change at your convenience the dates of beginning of simulation and/or the date at which sowing success is evaluated.