EcohydrologyTeam / ClearWater-riverine

A 2D water quality transporter model to calculate conservative advection and diffusion of constituents from an unstructured grid of flows
MIT License
6 stars 0 forks source link

Wetting / Drying leading to Concentration Spikes #41

Closed sjordan29 closed 3 months ago

sjordan29 commented 9 months ago

Wetting and drying cells leads to spikes in concentration. This issue can be resolved when we have developed a way to manage these spikes.

sjordan29 commented 9 months ago

Some notes on the issue, investigating using the model simulation from Sumwere Creek Plan 24 - thanks @jrutyna!

Cell 334 shown with a concentration of 1,064 mg/L.

image

Cell 334 had a zero-volume time step preceding the spike in concentration. The time step with the spike is the first time this cell is becoming wet in the simulation. image

Reviewing other hdf5 parameters at this time step of interest: image

The associated surface area is so small at this time step that RAS Mapper still shows the cell dry even though the cell has volume, but the water surface elevation has remained unchanged… this may be a result of the hdf5 file being reported in single precision.

image

aufdenkampe commented 8 months ago

@sjordan29 and @jrutyna, as I mentioned to both of you, this kind of behavior could occur if constituent mass is added to a cell before it's expected water volume.

Let's work together to dive into the algorithm to explore whether it might make sense to shift the mass transport time steps back in time by one relative to the water flux time steps.

@jrutyna mentioned to me that the is no constituent mass transport data to pair with the last tilmestep of the RAS model, so this might be a clue that we might further explore whether making this time shift is warranted.

sjordan29 commented 4 months ago

When solving for timestep 2619, when we look at the matrix for cell 334, we see the following values row | col = value 364 | 364 = 1.0. This is filled in here in the linalg module. This was an interim solve to make matrices with dry cells invertible. 253 | 364 = -1.7e-9 --> should be zero 254 | 364 = -1.1e-9 --> should be zero

Cell 334 should not have any off-diagonal values when solving for this timestep, because it is empty: this set-up in the matrix and the tables that @jrutyna posted below indicate that we should be using the t (here, 3618) timestep instead of the t+1 timestep (here, 3619) for the coefficients to the advection and diffusion terms: i.e., at the 3619 timestep, the face flow / face velocity / etc apply to what will be applied to the cell volume at the following timestep.

Making this change, we see the spike disappear:

image

This brings the interesting idea of potentially being able to collapse our matrix further. I opened issue #68 to address this separately.

@aufdenkampe & @jrutyna - thoughts? let me know if you want to do any testing on this before we merge to main.

sjordan29 commented 4 months ago

Tacking on some additional context while it's fresh in my mind.

gh-snap

In the transport equation above (full equation), we assume $n + \theta$ is equal to n+1 (first-order fully implicit Backward Euler scheme). In the original implementation, I assumed that all the advection/diffusion coefficient values (for advection: flow across face; for diffusion: face vertical area, diffusion coefficient, distance between cells) were at the n+1 timestep as well.

However, it appears that there's a bit of a time differential between these values and the cell volumes (at least in HEC-RAS output), as Jason shows in his post above: for example, the face flow in timestep n contributes to volume that will be recorded for a given cell in the n+1 timestep, but does not contribute to the cell volume at the n timestep itself. Therefore, I think we should use the values from the n timestep for the advection + diffusion coefficient terms when solving the equation above, because they are contributing to n+1 timestep volumes and therefore concentrations.

This should be considered further when we test with other flow fields. It seems like the issue mainly hinges on how data is tracked and stored across different models.

jrutyna commented 4 months ago

I would like to see the results from the mass balance function before and after this change just to make sure everything is working out in the entire simulation.

sjordan29 commented 4 months ago

Before: 4.87e-05 image

After: 1.18e-02 prct_error_mass image

Differences: bcTotalMassInOutAll, bcTotalMassOutAll and mass_end_calc. These all rely on the concentrations calculated (via mesh.face_flux_total); so now, if those concentrations are different based on a new matrix set-up, it makes sense that this would change.

Next steps

sjordan29 commented 3 months ago

Some notes -- will keep thinking on this, but wanted to jot a few things down to revisit on Monday.

  1. The DS_stage_value increases by 1,084 from 5,966,146 to 5,967,230 when we make this change. This means that some mass was "destroyed" or is at least no flowing out of the mesh in the old logic compared to the new logic. In the old logic, we put a dummy filler value in the diagonal for dry cells of 1.0. When this was placed with values on the off-diagnoal due to the time discrepancy discussed above, it meant that any cell that was wetting after being dry had an influx of artificial volume, causing the matrix to solve for a concentration/load in these cells that should not exist and taking from cells it shouldn't have been taken from at that timestep.
  2. I did some testing beyond this one cell, and found that there's still one cell that experiences a major concentration spike (cell 162) in Plan 24. It spikes from time slice 5548 to 5550. At time 5549, there is a very small volume in the cell (1.3068215e-10) and the concentration spikes to 24,455. If we consider that to be our n+1 timestep we're solving for, then we'd look at the face flow in timestep 5548 (n) for what values are contributing to the calculation. In all boundary cells (535, 575, 576, 577), the face flow is 0 at that timesteps. Perhaps a precision issue?
  3. The 3-hr sumwere creek simulation used for the demo also had an increase in mass balance error, but the final error was still low (1.107846e-02 % compared to -4.287278e-13%).
jrutyna commented 3 months ago

@sjordan29, yes there is a precision issue. HEC-RAS performs its computations in 64-bit, but only outputs to the hdf5 file as 32-bit.

image

sjordan29 commented 3 months ago

@jrutyna, compared the 100 answer with the simulation and got the following results:

image

I find it a little curious that the US_Flow_Mass values are different between the 100 answer and the simulation.

I was tweaking some of the code to try to resolve this, and found that the tests ended at the second to last timestep (I think due to a relic of a past bug in the code related to timesteps). Updating that code to use the last timestep in the _mass_bal_global_100_Ans function and _mass_bal_global functions, I get the following results:

image

Interestingly, this increases some of the differences between the answer and simulation, but the overall error for both tests is decreased.

sjordan29 commented 3 months ago

Just set the diffusion coefficient to zero which reduces the error compared to the 100 everywhere answer:

image

I will work on resolving where that 300 difference is coming in for the US_Flow_in_mass. I'd think that value should be identical from the answer to the simulation. I can see the downstream answers being slightly different due to the remaining wetting/drying spike

sjordan29 commented 3 months ago

I think I found the issue. In my _mass_flux method I should use the t+1 timestep for the parent_concentration and neighbor_concentration based on the transport equation using the fully implicit Backward Euler scheme (was previously using the t timstep).

Now the US_Flow_in_mass is the same between the 100 answer and our simulation and the only difference is for the mass flowing out, which increases in our simulation compared to the answer... so that means we're creating some mass, perhaps due to this precision issue?

image

aufdenkampe commented 3 months ago

@sjordan29, nice sleuthing! That's awesome.

To that output table, it might be useful to add a 4th column that divides the Diff by the Answer, similar to a percent difference but expressed in the more useful scientific notation. You could label it CV (Coefficient of Variation), even though it's not exactly that.

sjordan29 commented 3 months ago

image

aufdenkampe commented 3 months ago

These errors are now down to 4-5 significant figures (except for the last two lines that don't count because they're differences in errors). This is great news!