ec-jrc / lisflood-code

Lisflood OS - LISFLOOD
https://ec-jrc.github.io/lisflood
European Union Public License 1.2
111 stars 46 forks source link

Generate Model Ensembles #156

Open KumaGIS opened 1 month ago

KumaGIS commented 1 month ago

Dear Developers, I am currently using the LISFLOOD model to assess discharge uncertainty for my study area. I have been working with the test dataset that you previously provided and attempting to perform stochastic modeling.

To begin, I wanted to evaluate the uncertainty related to the forcing data. Specifically, I tried to add random Gaussian noise to the precipitation data by modifying the readmeteo.py module's source code. Despite adding the random noise to the precipitation variable, the resultant discharge values do not show any variation.

I have updated the settings.xml file to enable the Monte Carlo Simulation option. Additionally, I explicitly initialized the constructor of the Lisflood_MonteCarlo.py within the LisfloodModel() in main.py.

The sample directories were created as specified in the EnsMembers section of the settings.xml file, and dis_run.tss files were generated within each folder. However, when I plot these files, there is no variation in the discharge values.

The following code was added to the readmeteo.py module to perturb the precipitation data

mean = 1
std_dev = 0.15
netcdf_noise = np.random.normal(mean , std_dev , self.var.Precipitation.shape)

# Adding noice to the precipitation data
self.var.Precipitation = self.var.Precipitation + netcdf_noise
Ensure that no precipitation values are below 0
self.var.Precipitation = np.maximum(self.var.Precipitation, 0)

After applying some random noise as above I checked whether the precipitation values were changed or not. They were successfully perturbed.

I executed only the run_lat_lon.xml file with the perturbed precipitation data. Was that the reason for getting similar discharge output time series values, since it uses the same initial conditions (lzavin.map and avgdis.map) which were created by the pre_run_lat_lon.xml ? Or Do I have to execute pre_run_lat_lon.xml with the Monte Carlo Simulation?

Could you please tell me, what can be the possible issue here and how to solve this? I would greatly appreciate any help or suggestions you can offer to resolve this issue.

Looking forward to hearing from you soon.

Best Regards

StefaniaGrimaldi commented 1 month ago

Dear @KumaGIS ,

Thanks for your enquiry.

The MonteCarlo module is not currently used within our operational workflow. Therefore, we cannot guarantee systematic assistance on such type of investigations. Nevertheless, we will do our best to provide support and we hope that, by joining our forces, we will succeed in the analysis!

If our understanding is correct, discharge values in each output folder are bit identical, despite the edits that you added to the source code.

Generally speaking, prerun and run must always use the same meteorological forcings, static maps, and parameters. The prerun computes the initial conditions for the run: https://ec-jrc.github.io/lisflood-code/3_step5_model-initialisation/. Albeit this is true and required for the correct use of LISFLOOD, it is unlikely the cause of your issue.

We read in your message that you verified the expected changes in the precipitation. How was this verification done? We would like to recommend activating the following output flags in the lisflood xml settings file.

<setoption choice="1" name="repPrecipitationMaps"/>
<setoption choice="1" name="repRainMaps"/>
<setoption choice="1" name="repSnowMaps"/>

You will then find in each output folder the precipitation data used within the simulation: total precipitation, and the segregated components rain and snow. You might need to check whether the options above are already implemented in your .xml settings: you can find an example in https://github.com/ec-jrc/lisflood-code/blob/master/src/lisfloodSettings_reference.xml LINES 100 and 3544. Please do not hesitate to let us know whether you need help with this task.

Furthermore, could you please let us know at which line of the readmeteo.py you added your edits? This information will allow us to try to reproduce your issue.

Finally, are you running the python package or the source code (python src/lisf1.py settings.xml)?

Kind regards, on behalf of the developers team, Carlo and Stefania

KumaGIS commented 1 month ago

Dear @StefaniaGrimaldi , Thank you for your response. I greatly appreciate your reply, even though this question is related to a part that is not currently in use.

  1. I use the source code (python src/lisf1.py settings.xml) for testing purposes. Following your instructions, I enabled the specified options and ran the model. However, as soon as the first simulation ended, the model terminated automatically, and I received the following error message:

    File "/home/kumudu/LISFLOOD/lisflood-code/src/lisflood/global_modules/output.py", line 298, in write return self.writer.write(self._start_date, self._rep_steps) File "/home/kumudu/LISFLOOD/lisflood-code/src/lisflood/global_modules/output.py", line 159, in write nf1.variables[self.map_name][step, :, :] = uncompress_array(data) File "src/netCDF4/_netCDF4.pyx", line 5505, in netCDF4._netCDF4.Variable.setitem File "src/netCDF4/_netCDF4.pyx", line 5788, in netCDF4._netCDF4.Variable._put File "src/netCDF4/_netCDF4.pyx", line 2029, in netCDF4._netCDF4._ensure_nc_success RuntimeError: NetCDF: Index exceeds dimension bound.

When I checked the output folders, I noticed that the output rainfall maps and discharge time series data were generated only in the first folder. Because of this, I used the tp.nc precipitation NetCDF stack file separately and applied the perturbations as mentioned earlier. I then plotted the time series values at the outlet point to verify whether the rainfall values were correctly perturbed. By plotting the time series values of the perturbed rainfall data, I found that each Monte Carlo run generated different precipitation values. However, the issue was that despite the precipitation values being perturbed, this variation was not reflected in the discharge time series values in each folder.

  1. I modified the readmeteo.py module by adding the perturbation part at the end of the dynamic() method.

def dynamic(self): """ dynamic part of the readmeteo module read meteo input maps """ settings = LisSettings.instance() option = settings.options binding = settings.binding

    # ************************************************************
    # ***** READ METEOROLOGICAL DATA *****************************
    # ************************************************************
    if option['readNetcdfStack']:

        step = self.var.currentTimeStep() - self.var.firstTimeStep()
        # date = date_from_step(binding, self.var.currentTimeStep())

        # Read from NetCDF stack files

        self.var.Precipitation = self.forcings['PrecipitationMaps'][step] * self.var.DtDay * self.var.PrScaling
        self.var.Tavg = self.forcings['TavgMaps'][step]
        self.var.ETRef = self.forcings['ET0Maps'][step] * self.var.DtDay * self.var.CalEvaporation
        self.var.EWRef =self.forcings['E0Maps'][step] * self.var.DtDay * self.var.CalEvaporation

    else:
        # Read from stack of maps in Pcraster format

        self.var.Precipitation = readmapsparse(binding['PrecipitationMaps'], self.var.currentTimeStep(), self.var.Precipitation) * self.var.DtDay * self.var.PrScaling
        # precipitation (conversion to [mm] per time step)
        self.var.Tavg = readmapsparse(binding['TavgMaps'], self.var.currentTimeStep(), self.var.Tavg)
        # average DAILY temperature (even if you are running the model on say an hourly time step) [degrees C]
        self.var.ETRef = readmapsparse(binding['ET0Maps'], self.var.currentTimeStep(), self.var.ETRef) * self.var.DtDay * self.var.CalEvaporation
        # daily reference evapotranspiration (conversion to [mm] per time step)
        # potential evaporation rate from a bare soil surface (conversion to [mm] per time step)
        self.var.EWRef = readmapsparse(binding['E0Maps'], self.var.currentTimeStep(), self.var.EWRef) * self.var.DtDay * self.var.CalEvaporation
        # potential evaporation rate from water surface (conversion to [mm] per time step)

    self.var.ESRef = (self.var.EWRef + self.var.ETRef)/2

    if option['TemperatureInKelvin']:
        self.var.Tavg -= 273.15

    # Generate random noise with mean 1 and standard deviation 0.1
    # Initialize array to store perturbed data

    # *************************************************************
    # ****** ADD RANDOM NOISE TO METEOROLOGICAL DATA **************
    # *************************************************************
    # Generate random noise with mean 1 and standard deviation 0.1
    mean_pr = 1
    std_dev_pr = 0.15

    pr_noise = np.random.normal(mean_pr, std_dev_pr, size=self.var.Precipitation.shape)
    # Adding noice to the precipitation data
    self.var.Precipitation = self.var.Precipitation * pr_noise 
    # Ensure that no precipitation values are below 0
    self.var.Precipitation = np.maximum(self.var.Precipitation, 0)

I understand that you might not be able to provide full support for this, as it is not currently in use. However, if you modify the readmeteo.py file as described above, you could potentially check the output files on your end. I would greatly appreciate any help or suggestions you can offer to resolve this issue. Looking forward to hearing from you soon. Thank you Best Regards

KumaGIS commented 1 month ago

Dear developers, If above implementation is wrong, could you please tell me what should be the correct steps to enable the montecarlo simulation ? If you are unable to provide detailed information, could you please tell me which parts of the source code should be updated? Then i can work on that.

I would greatly appreciate any help or suggestions you can offer to resolve this issue. Looking forward to hearing from you soon. Thank you

doc78 commented 1 month ago

Dear @KumaGIS

Can you please provide us your settings XML file and your full output messages on screen when you try to run Lisflood enabling MonteCarlo option?

Thanks Carlo

KumaGIS commented 1 month ago

Dear @doc78, As requested, I have attached the settings XML file used to simulate the LF_lat_lon use case. I encountered several errors but was able to resolve them one by one. Additionally, I have attached a small PDF document with screenshots detailing the errors for your reference.

I currently have the following questions regarding the Monte Carlo simulation:

The Monte Carlo simulation is now working. Initially, I ran the simulation with 10 ensemble members without applying any perturbations, and I enabled additional options such as repSnowMaps and repRainMaps. The time series data files ‘dis_run.tss’ and ‘chanqWin.tss’ were successfully created within each sample folder. However, the rain.nc and snow.nc maps were only created in the first sample folder. Why are these maps not being created in the other sample folders?

In the research paper https://doi.org/10.1016/j.jhydrol.2016.10.041, the LISFLOOD model was successfully used to generate open-loop deterministic simulations by applying random sampling perturbations to the precipitation inputs to obtain probabilistic LISFLOOD simulations with 24 ensembles. Specifically, the precipitation forcing was scaled between 0 and 100 and perturbed with white noise with a mean of 1 and a standard deviation of 0.15 to prevent ensemble deterioration. Could you please advise on how to implement something similar? I understand you might not be able to provide the entire solution, but could you at least outline the main steps?

I would greatly appreciate any help or suggestions you can offer to resolve this issue. Looking forward to hearing from you soon.

MC_Settings_XML.zip

KumaGIS commented 1 month ago

dear @doc78, @StefaniaGrimaldi, I successfully generated the ensemble simulations. However, there's an issue with the output: while the time series data is correctly created in the designated sample folders, the maps are only generated in the first sample folder. Is there a way to resolve this? I attempted to modify the output.py source code, but it still didn't work.

I would greatly appreciate any help or suggestions you can offer to resolve this issue. Looking forward to hearing from you soon.

doc78 commented 1 month ago

Dear @KumaGIS Unfortunately the new output maps module is not compatible with the MonteCarlo option. A quick workaround would be to re-initialize the output_maps instance at the first step of each dynamic loop, by adding the following lines after Line 57 in Lisflood_dynamic.py:

       if self.TimeSinceStart == 1 and settings.mc_set:
            # reset flag for netcdf output for all, steps and end
            _ = CDFFlags(uuid.uuid4())  # init CDF flags
           # initialize output maps again to match new folders in MonteCarlo simulation
           self.output_module.output_maps = OutputMapsFactory(self.output_module.var)

Hope it helps to continue your experiments. Best Regards Carlo

KumaGIS commented 1 month ago

Dear @doc78, Thank you for the update. Now it is working. I have one more question. Could you please let me know, which version of the LISFLOOD model finally supported data assimilation using EnKF?

koray-yilmaz commented 1 month ago

Dear @KumaGIS, I think it would be great help for the community if you could summarize the steps to allow Precipitation perturbation and ensemble LISFLOOD output. Looking forward to the discussions on data assimilation! Best.

KumaGIS commented 3 weeks ago

Dear @StefaniaGrimaldi, @doc78 Sure @koray-yilmaz. I will try my best to contribute to the data assimilation part.

I have one final question: What was the last version of the Lisflood model that supported the data assimilation framework?

StefaniaGrimaldi commented 3 weeks ago

Dear @KumaGIS,

the work on data assimilation with LISFLOOD has been performed quite some time ago: before the introduction of the versioning of the source code and before its open source publication. We do not have the source code that supported the data assimilation framework. We only have the part of the code that is still included in this repository (and which is not supported in the current version due to time and resources constraints).

Your willingness to develop your own data assimilation routine and to share your developments is highly appreciated! Thank you!

Kind regards, on behalf of the developers team, Stefania

KumaGIS commented 2 weeks ago

Dear @doc78 @StefaniaGrimaldi @domeniconappo Thank you for your support until now. I am a PhD student and I believe the Lisflood model can be used for data assimilation if I work on the model for a bit longer. I have a humble request. I will do the hard work and try to make the data assimilation framework work again. However, sometimes I need some guidance because certain parts of the model are a bit difficult to understand.

I don't have a team to work with and have to do everything alone. If I face some difficulties, I will ask for your assistance, and if possible, please try to help me however you can.

Thank you for your understanding and support Best regards

KumaGIS commented 1 week ago

Dear @doc78 @StefaniaGrimaldi @domeniconappo, I've made some updates and improvements to my workflow. I can now generate discharge ensembles by perturbing precipitation data with random Gaussian noise. Additionally, I've modified the PCRaster map outputs to be enabled during the Monte Carlo simulation and used the postmc() method to calculate ensemble statistics.

My next step involves editing the Lisflood_EnKF.py script to enable the assimilation process. However, I need some guidance on developing the setState(), setObservation(), and resume() methods.

Initially, I plan to assimilate observed streamflow data using the datasets you provided for lat_lon use cases. I've made some updates to the existing code and would appreciate it if you could review my changes and help correct any mistakes.

Here is how I updated the setObservations() method to assimilate data at the outlet point (one observation at a time):

`def setObservations(self):

timestep = self.currentTimeStep()

observedData = readmap(generateNameT("obs", timestep))

values = numpy.zeros(1)

values[0] = cellvalue(observedData, 10, 25)[0]

`

` # creating the observation matrix

here without added noise

observations = numpy.array([values,]*self.nrSamples()).transpose()

# creating the covariance matrix (nrObservations x nrObservations)
# here just random values
covariance = numpy.random.random((1,1))

self.setObservedMatrices(observations, covariance)`

I am unsure how to update the setState() and resume() methods. Could you please review my implementation and provide a draft script or suggestions for these methods? Any help or suggestions you can offer would be greatly appreciated.

Looking forward to your guidance. Thank you!