OpenDA-Association / OpenDA

Open data assimilation toolbox
GNU Lesser General Public License v3.0
86 stars 31 forks source link

Hamil Localization for DFlowFM 3D #352

Open NadaSulaiman opened 1 year ago

NadaSulaiman commented 1 year ago

Hello,

I have been trying to apply the Hamil localization algorithm to 3D version of DFlowFM, but I get the error attached.

For reference, regular EnKF and AutoZhang localization algorithms work well for both 2D and 3D version of the model.

With Hamil, I could only run the algorithm in 2D version (ie kmx=0 in .mdu file). I would get the error attached when I apply in 3D.

Is the algorithm compatible for 3D output file? Many thanks

Error_3D NoError 2D
nilsvanvelzen commented 1 year ago

I think this is an issue in the DFLOWFM wrapper and not OpenDA where the mask in case of a 3D model is not constructed correclty. It fails when it applies the mask and the mask has a different number of elements that the model state vector. I'll assign it to somebody who is involved with te wrappers to the Deltares models.

erikpelgrim commented 1 year ago

Hi Nada,

I assume you use a multi-layered restart file that is read via the DFlowFMRestartFileWrapper? Support for using this in combination with Hamill localization has been investigated and a quick fix is available in the LocalisationDFlowFM branch. This has not been properly tested and is not available in the master or 3.1.X release branches. If you want you can try the LocalisationDFlowFM branch or I can provide the code changes and you apply them locally. But I can't guarantee it will solve your problem or no others will arrise.

Regards, Erik

NadaSulaiman commented 1 year ago

Hi Erik and Nils

Many thanks for your prompt reply.

I have tried the branch which you have recommended, but I could not make it work.

Perhaps an error on my part, hence I am attaching the following example:

Simple_Model.zip

It is a simple 5x5 grid DFlowFM model, with both 2D and 3D (5-layers) versions.

In both models, temperature is being assimilated to the grid cell at an observation point located in middle of the domain, for the 3D version, it is assimilated to the surface layer.

You will find in the folder:

To make the example run, just change the path to the DELFT executable in the stochModel/bin folder.

All EnKF algorithms are set to store the gain factors in the directory, so you can compare the results. I have noticed that the gain values are outside the [0 to 1] range, would appreciate if I could get a verification on the validity of the calculated results as well.

As always, your support is highly appreciated.

Best wishes Nada

erikpelgrim commented 1 year ago

Hi Nada,

I tried the quick fix for your model and it works. If you want you can use

model_dflowfm_blackbox_Hamill3D_dirtyQuickFix.zip

However, this code will most likely not reach the Master or realease branches because it is quite a dirty fix. I am not sure when a decent fix will be available. It is up to you whether you use the model_dflowfm_blackbox.jar from this zip, if you do take into account that whenever you switch to a real or newer OpenDA branch / release this fix will not be present and the problem can reoccur.

Hope it is helpful nonetheless.

Cheers, Erik

NadaSulaiman commented 1 year ago

Hi Erik

Once again, many thanks for your help. I shall try it today and give back my feedback.

The error I got from the orginal localisation branch was from the NetcdfDataObject.java file, where the one in the branch did not accept the layerDimensionName=laydim argument in the wrapper file. I swapped it with NetcdfDataObject.java from the current directory and it seems to do the trick, but I got these strange results.

2D Results were correct. I tried on complex model and the simple model which I uploaded

Screen Shot 2023-08-25 at 2 15 30 PM

Once the layers of the model become more than 1 in 3D, the coordinates or the indexing of the kalman gain appears to be incorrect

Screen Shot 2023-08-25 at 2 15 50 PM Screen Shot 2023-08-25 at 2 15 42 PM Screen Shot 2023-08-25 at 2 15 37 PM

It was brought to my attention when I was examining the analysis results

Screen Shot 2023-08-25 at 2 22 28 PM Screen Shot 2023-08-25 at 2 23 22 PM

I will be trying the new fix on the complex and simple model and will give back my feedback here

Once again appreciating your help!

NadaSulaiman commented 1 year ago

Hi Nada,

I tried the quick fix for your model and it works. If you want you can use

model_dflowfm_blackbox_Hamill3D_dirtyQuickFix.zip

However, this code will most likely not reach the Master or realease branches because it is quite a dirty fix. I am not sure when a decent fix will be available. It is up to you whether you use the model_dflowfm_blackbox.jar from this zip, if you do take into account that whenever you switch to a real or newer OpenDA branch / release this fix will not be present and the problem can reoccur.

Hope it is helpful nonetheless.

Cheers, Erik

Hi Erik. I see the file has the file model_dflowfm_blackbox.jar How do I use it in OpenDA? I'm not sure where to copy it in my directory. Thanks

erikpelgrim commented 1 year ago

Hi Nada,

there should be a directory with OpenDA binaries called bin. In there should already be a model_dflowfm_blackbox.jar, this should be replaced

NadaSulaiman commented 1 year ago

Hi Erik.

Just gave the code ago. Happy to report that it is working.

Screen Shot 2023-08-25 at 5 57 18 PM

However the problem of the computed Kalman Gain indexing are still there.

I had written an independent MATLAB code to compute the Kalman Gain, and upon further investigation, it seems that OpenDA do calculate the values correctly, but misplace them on a different index.

For the cases of 2D and 3D models of one layer (ie where the state size and the size of the coordinates are the same), both MATLAB and OpenDA have identical Kalman Gain.

Screen Shot 2023-08-25 at 5 30 30 PM

For the case of 3D with layers +1, OpenDA compute the value of the gain correctly, but misplace them across all layers.

Screen Shot 2023-08-25 at 5 30 20 PM

Kindly find the screenshots attached to illustrate my point.

I am not sure if the values (indexing) of rh0 are effected by this problem as well, but from the plots I think it is ok.

erikpelgrim commented 1 year ago

Hi Nada,

as far as I know there indeed have only been applications of 3D with the usage of just 1 layer. So I do believe there might be some fixes needed. But I've been debugging the writing of the kalman gain to xml but so far I did not see anything obviously wrong yet.

Can you help me pinpoint the problem by answering the next questions:

NadaSulaiman commented 1 year ago

Hi Eric,

I believe it should be a straightforward fix, as it seems to be an indexing issue rather than a calculation issue.

- Do you get the kalman gain from the written xml files: kalmanGainStorage.xml? Yes, I will attach the MATLAB code which does retrieve the values directly from kalmanGainStorage.xml, as computed bu OpenDA - How are you plotting them? Will attach code below. - Did you already try reading the written kalmanGainStorage.xml in OpenDA via a SteadyStateFilter? I tried on a different setup, SteadyStateFilter do work, but again, it does follow the order of values which is written in the .xml file. - Are the mixed up kalman gain values messing up the results or is it the problem just in writing? Yes, they directly affect the analysis result in the model (ie the computed initial conditon). That is how I spotted the issue. When I was looking at the map file from DELFT QuickPlot, I spotted the strange stripes (see picture I attached previously).

MODEL (27MB): https://www.dropbox.com/scl/fi/ckuvctglzt7gqdc7aeytx/Simple_Model.zip?rlkey=h0ycs6w3drniuh1ym785pu6mf&dl=0

I have already ran the model for 3D ENKF assimilation (one timestep), so result files are already uploaded in the .zip file. Please go the folder 'Matlab', and run the following codes in order (change the path files inside the code to match your directroy): Kalman_MATLAB.m Kalman_OpenDA_plot.m Comparision_MATLAN_OPENDA.m

I compared the computed Kalman Gain by OpenDA and Matlab for both cases where observation is at surface (ie layer4) and at bottom (ie layer0). It seems that OpenDA can identify the location relative to the state variables correctly (values for both MATLAB and OpenDA match up when you compare them in ascending order). But indexing issue is there for both cases.

You should get the same plots I have attached below when you run the code.

Please note that I haven't checked if there is an indexing issue with the 3D values for rh0 (weighting matrix). Perhaps it's worth checking that to once the problem is fixed with observations located at different layers.

Best wishes, Nada

Screen Shot 2023-09-01 at 5 44 33 PM Screen Shot 2023-09-01 at 5 32 45 PM
erikpelgrim commented 1 year ago

Hi Nada,

I tracked down 1 place where the layers messed up the coordinates. It was inside the dirty quick fix for the Hamill localisation. You can try a new model_dflowfm_blackbox.jar from the next zip:

model_dflowfm_blackbox_dirtyQuickFix2.zip

I am, however, a bit confused about the order and the exact (combination) of versions you are using and whether or not another fix at a different place will be needed.

But before I dive into that, please give the new jar file a try and let us know whether the problem is resolved.

Cheers, Erik

NadaSulaiman commented 1 year ago

Hi Erik,

The issue with the messed up coordination was there for both the current version at the OpenDA directory, and Hamil localisation branch (in both cases, I applied regular EnKF without localisation, hence it can be done on DFLOWFM 3D cases). As localisation is essentially an element wise multiplication with rho, I would assume this should fix the issue for both directories.

I will test it out today and will report back here.

Many thanks again!

erikpelgrim commented 1 year ago

I just updated the jar file, I realized I uploaded the wrong file. I fixed it in the previous comment.

NadaSulaiman commented 1 year ago

Hi Erik,

Just ran the code with the updated Jar file, it is still giving kalman gain with the same issue,

Screen Shot 2023-09-11 at 8 04 16 PM

Have you tried comparing the Kalman Gain values with the MATLAB code I have provided? It would be great if we could see weather you are getting the same plots or not.

Best wishes Nada

NadaSulaiman commented 1 year ago

Hi Erik,

Just ran the code with the updated Jar file, it is still giving kalman gain with the same issue, Screen Shot 2023-09-11 at 8 04 16 PM

Have you tried comparing the Kalman Gain values with the MATLAB code I have provided? It would be great if we could see weather you are getting the same plots or not.

Best wishes Nada

The picture above is from the localisation branch. Picture below is from the current files on the main OpenDA directory as we speak. Issue is also there.

Screen Shot 2023-09-11 at 8 25 16 PM
NadaSulaiman commented 1 year ago

I compared QuickFix v1 vs v2 for Hamil. I think V1 applies the localisation mask correctly (it looks to me this way). If you disregard the issue with the original pre localisaion kalman gain values (ie ENKF values), I think QuickFix v1 does a correct job applying the distance filtering on them.

Results below are from the example I attached earlier. Plotted with the MATLAB code provided above.

Screen Shot 2023-09-12 at 2 47 49 PM
erikpelgrim commented 1 year ago

Hi Nada,

thank you for testing this. It is unfortunate that it did not resolve the problem. This most likely means that the problem is at a different part of OpenDA than I thought and will require more time to pinpoint where exactly. I cannot promise progress on the short term because I do not know when I will have time for another analysis.

Best, Erik

NadaSulaiman commented 1 year ago

Hi Erik.

I have figured out exactly where the problem lies in 3D case (>1 layer). I hope this explanation can help you adjust the code.

OpenDA is currently writing Kalman Gain with a structure that combines elements from multiple layers. DFLOWFM expect the Kalman Gain to be organized on a per-layer basis, where all 'x' elements from each layer are grouped together.

Screen Shot 2023-09-13 at 1 21 58 AM

You can see now why in the case of 1 layer, the kalman gain computed by OpenDA and the one computed by the matlab code are identical. And there are no issue when they translate into a DFLOW restart file.

To confirm this, I took the values of kalman gain directly from kalmanGainStorage.xml for the 3D case and rearranged them based on the way DFLOWFM expect them to be (per-layer basis), they matched those computed independantly by MATLAB.

Screen Shot 2023-09-13 at 1 22 11 AM

I hope this can help you speed up pinpointing the issue in the code. Best wishes!

erikpelgrim commented 1 year ago

Hi Nada,

thank you for this additional analysis. My previous fix was trying to restore this problem but seemed to have made it worse. But you did confirm I was on the right track. I expect a similar fix needs to be somewhere else, or maybe even multiple fixes.

To double check if we indeed are talking about the same I think the DFlow FM restart file puts the values for different layers next to eachother, since the layerdim is the last of the dimensions of tem1:

double tem1(time=1, nFlowElem=25, laydim=5);

I normally am used to have the layer dimension before the horizontal grid dimension My next steps would be to go trough multiple places in the code where this may go wrong.

Cheers, Erik

NadaSulaiman commented 1 year ago

Hi Erik. I'm glad I could be of help.

The dimension of the restart file is as follows:

Screen Shot 2023-09-13 at 11 45 32 AM

So if you import tem1, it would be 5x25 variable, and each row represent the layer, starting from layer0 (bottom).

The good news is, even with the missed arrangement of the elements, OpenDA was able to calculate the kalman gain value correctly. Would this issue potentially affects the arrangement of the weighting matrix (rho) for autozhang and hamil?

erikpelgrim commented 1 year ago

Hi Nada,

I see we use different netcdf tooling, I use Panoply which shows the dimensions the other way around:

double tem1(time=1, nFlowElem=25, laydim=5); :coordinates = "FlowElem_xcc FlowElem_ycc"; :standard_name = "sea_water_temperature"; :long_name = "temperature in flow element"; :units = "degC";

But this does make it a likely cause of the mismatch.

I think all data coming from such netcdf files are affected and I will have to check them all.

NadaSulaiman commented 1 year ago

I use the netcdf command line in MATLAB to check the dimension.

I have checked how OpenDA reads the state (temperature from map file) through the OpenDA GUI, and it seems to be picking it up correctly, so hopefully not all netcdf files are affected.

Kindly let me know if you need me to test things out on my end.

Many thanks again for the support!

NadaSulaiman commented 1 year ago

Hi Erik

As a temporary fix, I wrote a .bat file that rearranges the values inside the .nc restart file (after the analysis) for the 3D case inside each ensemble directory.

Is there a way, either inside the Algorithm.xml or inside the .oda file where I can direct OpenDA to run this .bat file after it generates the analysis file and before it goes into another simulation run?

Many thanks!

erikpelgrim commented 1 year ago

Hi Nada,

I was and still am limited on time, but I made another quick fix for you to try:

model_dflowfm_blackbox_transposeHack_quickFix3dCoords.zip

The code for the DFlowFMRestartFileWrapper transposes the 2nd and 3rd dimension on read and transposes them back when writing. This is only done for 3D variables.

The other change I reverted back to the first quick fix where the coordinates for Hamill were correct as I understood from you.

I do not have the time to look as intensively at the results as you. So I hope you can confirmed or deny whether these quick fixes work. If they work I can try to make decent fixes for them, they may need some extra configuration then.

Cheers, Erik

erikpelgrim commented 1 year ago

Hi Nada,

about your question when to run the .bat script.

The only place I know you can run them is in the D3DWrapper.xml. The place would then be inside the <computeActions>

More specifically I would expect such scripts to be placed just before and after:

            <action workingDirectory="%instanceDir%" linuxExe="%exeDir%/start_dimr.sh" windowsExe="%exeDir%/start_dimr.bat">
                <arg>%dimrconfigfile%</arg>
                <checkOutput file="%outputDir%/%hisfile%"/>
                <checkOutput file="%outputDir%/%mapfile%"/>
            </action>

Which runs DFlow FM

Let me know if this is helpful.

Cheers, Erik

NadaSulaiman commented 12 months ago

Hi Nada,

about your question when to run the .bat script.

The only place I know you can run them is in the D3DWrapper.xml. The place would then be inside the <computeActions>

More specifically I would expect such scripts to be placed just before and after:

            <action workingDirectory="%instanceDir%" linuxExe="%exeDir%/start_dimr.sh" windowsExe="%exeDir%/start_dimr.bat">
                <arg>%dimrconfigfile%</arg>
                <checkOutput file="%outputDir%/%hisfile%"/>
                <checkOutput file="%outputDir%/%mapfile%"/>
            </action>

Which runs DFlow FM

Let me know if this is helpful.

Cheers, Erik

Hi Erik. Can confirm that it worked. I used it to introduce noise/uncertainty independently through model discharge definition (.tim file). Many thanks!

NadaSulaiman commented 12 months ago

Hi Nada,

I was and still am limited on time, but I made another quick fix for you to try:

model_dflowfm_blackbox_transposeHack_quickFix3dCoords.zip

The code for the DFlowFMRestartFileWrapper transposes the 2nd and 3rd dimension on read and transposes them back when writing. This is only done for 3D variables.

The other change I reverted back to the first quick fix where the coordinates for Hamill were correct as I understood from you.

I do not have the time to look as intensively at the results as you. So I hope you can confirmed or deny whether these quick fixes work. If they work I can try to make decent fixes for them, they may need some extra configuration then.

Cheers, Erik

Hi Erik,

I would like to thank you first for your support on this issue. I have been able to make huge progress in my research, which is centralized on the application of OpenDA, since implementing those fixes.

I can confirm that the recent fix has resolved the issue. I tried it through the localisation version. I compared the values of the kalman gain of OpenDA to those computed by my code, and I am happy to report that they match.

Screen Shot 2023-09-25 at 6 50 44 PM

I tried it as well on a more complex model (assimilation is targeted at layer4) and I can see that both kalman gain pre and post localisation make sense.

Screen Shot 2023-09-25 at 6 51 13 PM

Many thanks again.

And please, feel free to include the simplemodel.zip example which I attached in previous comment to the main directory if you feel it might be beneficial to new users who want to experiment with an example that assimilates temperature in 2D and 3D. I included all assimilation algorithm in it.

Best wishes, Nada

erikpelgrim commented 12 months ago

Hi Nada,

Great to hear it is working properly now! I will try to make decent versions for the fixes so it can be included in the master branch.

Also thank you for your intense testing, analysis and reporting to us. This will help others that will do similar things. I think your simple model will definitely be useful to include in our examples.

If sometime you want to show / tell more about what you are doing and why in a presentation (or other form) let us know. Me and multiple of my colleagues have expressed an interested in that.

Cheers, Erik

NadaSulaiman commented 12 months ago

Hi Erik, Thank you for showing interest in our work. That would be a fantastic opportunity. I will email you to arrange something soon.

Best wishes Nada