Uniform random numbers for horizontal diffusion?

OpenDrift / opendrift

Open source framework for ocean trajectory modelling

https://opendrift.github.io

GNU General Public License v2.0

236 stars 115 forks source link

Uniform random numbers for horizontal diffusion? #1159

Open knutfrode opened 10 months ago

knutfrode commented 10 months ago

As noted by @Boorhin in the Slack channel, uniform random numbers are used for the vertical diffusion, whereas normal (gaussian) distributed random numbers are used for the horizontal diffusion: https://github.com/OpenDrift/opendrift/blob/master/opendrift/models/basemodel.py#L2416

Probably it should be uniformly distributed random numbers also for the horizontal diffusivity? I.e. we should change x_vel = self.elements.moving * np.sqrt(2*D/dt) * np.random.normal( scale=1, size=self.num_elements_active()) to x_vel = self.elements.moving * np.sqrt(2*D/dt) * np.random.uniform( scale=1, size=self.num_elements_active())

Do you agree @johannesro @nordam ?

Boorhin commented 10 months ago

Hi, in this case it would be changed to x_vel = self.elements.moving * np.sqrt(6*D/dt) * np.random.uniform( scale=1, size=self.num_elements_active()) But beware that brings very different results in my tests. I am not sure this is the correct way. The Langevin equation implies a gaussian distribution for eddy diffusivity However the paper of Visser 1996 implies that a uniform distribution is good for vertical diffusion but it needs to be very small time-steps compared to the diffusion coefficient (K). It is a bit tricky as in fact the time step should scale with the vertical gradient of diffusivity variation. K´´<<4/dt. I have done some naive 1D tests from point source with the 2 different models cumulative_drift I also did the double eddy tests with runge-kuda4, 2 time-steps and 3 different coefficients of diffusion scaled close to the advection speed diffusion_test_0 001 diffusion_test_0 0001 diffusion_test_0 00001 It is my understanding that the normal behaviour is to stay within the double gyre according to the LCS tutorial. In effect choosing between uniform and Gaussian affects the turbulence quite significantly. I know a particle tracker that uses random normal https://github.com/bjornaa/ladim1/blob/07ce517cc1702ef7c7cee688599e427fd6f6df1f/ladim1/tracker.py#L214 and another commercial one that claims that it uses a uniform one according to their manual (haven't read the code) newdepomod. That's all the information I could gather

oyvindbreivik commented 10 months ago

Hi,

Interesting discussion. My two bits is that Gaussian perturbations are a more natural choice than uniform perturbations. As noted by Julien above, the Langevin equations would lead to Gassian perturbations, and rather than changing the horizontal diffusion to become uniform, I would consider changing the vertical one to be Gaussian.

Best

Oyvind

On Thu, 21 Sept 2023 at 09:54, Julien Moreau @.***> wrote:

Hi, in this case it would be changed to x_vel = self.elements.moving np.sqrt(6D/dt) * np.random.uniform( scale=1, size=self.num_elements_active()) But beware that brings very different results in my tests. I am not sure this is the correct way. The Langevin equation implies a gaussian distribution for eddy diffusivity https://en.wikipedia.org/wiki/Eddy_diffusion However the paper of Visser 1996 implies that a uniform distribution is good but it needs to be very small time-steps compared to the diffusion coefficient (K). It is a bit tricky as in fact the time step should scale with the vertical gradient of diffusivity variation. K´´<<4/dt. I have done some naive 1D tests from point source with the 2 different models [image: cumulative_drift] https://user-images.githubusercontent.com/9576982/269532169-26cea911-21ad-4aa1-86d2-293b0eeca790.png I also did the double eddy tests with runge-kuda4, 2 time-steps and 3 different coefficients of diffusion scaled close to the advection speed [image: diffusion_test_0 001] https://user-images.githubusercontent.com/9576982/269532548-6ad5f7bc-2eed-4cff-8953-3bef622ccd38.png [image: diffusion_test_0 0001] https://user-images.githubusercontent.com/9576982/269532588-ba9fce5f-2c22-4321-9f1b-a87feaadad82.png [image: diffusion_test_0 00001] https://user-images.githubusercontent.com/9576982/269532660-e00c0ebc-9c76-494b-be3b-1af386ab077e.png It is my understanding that the normal behaviour is to stay within the double gyre according to the LCS tutorial https://shaddenlab.berkeley.edu/uploads/LCS-tutorial/examples.html. In effect choosing between uniform and Gaussian affects the turbulence quite significantly. I know a particle tracker that uses random normal https://github.com/bjornaa/ladim1/blob/07ce517cc1702ef7c7cee688599e427fd6f6df1f/ladim1/tracker.py#L214 and another commercial one that claims that it uses a uniform one according to their manual (haven't read the code) newdepomod. That's all the information I could gather

— Reply to this email directly, view it on GitHub https://github.com/OpenDrift/opendrift/issues/1159#issuecomment-1729043517, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADMKEH6LUQCLXU2QQ3ZBQI3X3PXB5ANCNFSM6AAAAAA5BCQEQA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- \Oyvind

+47-91102707

Boorhin commented 10 months ago

Hi @oyvindbreivik , Have you read visser ~~et al. 1996~~ 1997 paper? if not, it is easy to find online. It is my impression that using a Gaussian distribution is not as scale dependent and mixing each minute becomes unnecessary? That would accelerate computations I believe. I am really no specialist and inputs are very welcome. Cheers, Julien

knutfrode commented 10 months ago

Here is that paper (but 1997, not 1996) https://www.int-res.com/articles/meps/158/m158p275.pdf

But awaiting @nordam to shine some light onto this dark matter

ChrisBarker-NOAA commented 10 months ago

I looked into this way back when -- we use a uniform distribution in PYGNOME -- after a number of timesteps the results should be normal regardless of the distribution for the random walk -- you do need to make sure that the the variance is the same.

My take: it doesn't matter which you use -- I think we went with uniform back then because it was substantially faster to get a uniform random number -- maybe that's changed.

@Boorhin: if you got substantially different results, then perhaps they weren't scaled equivalently? Or not enough timesteps passed for the solution to converge -- but I recall it didn't take many timesteps (ten or so?)

See Appendix B in the GNOME Tech Doc:

GNOME_tech_doc.pdf

Interestingly (to me) -- this was the very first contribution I made to the GNOME project 24 years ago!

nordam commented 10 months ago

I think @ChrisBarker-NOAA is right, it probably doesn't matter when the timestep is small enough. From the central limit theorem, we know that the sum of $N$ random numbers (with finite variance) approaches a Gaussian when $N$ is large (and with small timestep, we are effectively adding a large number of random contributions.

That said, you might get away with using a longer timestep if you use Gaussian random numbers to begin with, so I would also agree with @oyvindbreivik that it's best to use Gaussian random numbers in both cases. (Though on the other hand, Gaussian random numbers can in principle be arbitrarily large, so once in a while you might have to handle some strange edge case, such as particles ending up so far above the surface that a simple reflection would place them below the sea floor. Provided we handle such cases, Gaussian random numbers are better in my opinion.)

Regarding the specific 1D example above, it's a bit hard to say without seeing the code that produced the plots, but it looks to me like the variance grows faster in the uniform case than in the Gaussian case, which would indicate a mistake in the code. Also, the line self.elements.moving * np.sqrt(6*D/dt) * np.random.uniform( scale=1, size=self.num_elements_active()) doesn't make sense to me. np.random.uniform does not take scale as a keyword argument, but even if we changed it to something like np.random.uniform(low=-1, high=1, size=...), the random steps would also be smaller with longer timesteps, whereas they should be larger.

A useful test is that if the diffusivity, $D$, is a constant (and in the absence of reflecting boundaries), then the variance of the particle distribution should grow linearly with time, as $2Dt$ (assuming all the particles start at the same location).

Here is some example code and some plots showing the 1D example as I would have implemented it, as well as the variance test.

# Parameters
D = 0.02 # Diffusivity
Np = 100 # Number of particles 
timesteps = [60, 600]
Tmax = 6*3600

var_gaussian = []
var_uniform = []
t_list = []

fig, ax = plt.subplots(ncols=2, nrows=2, figsize=(9,6), sharex=True, sharey=True)
for i, dt in enumerate(timesteps):
    Nt = int(Tmax/dt)
    Z_uniform = np.zeros((Nt+1, Np))
    Z_gaussian = np.zeros((Nt+1, Np))
    for n in range(Nt):
        # Uniform
        # Generate random numbers with variance dt
        dW = np.random.uniform(low=-np.sqrt(3*dt), high=np.sqrt(3*dt), size=Np)
        Z_uniform[n+1,:] = Z_uniform[n,:] + np.sqrt(2*D)*dW

        # Gaussian
        # Generate random numbers with variance dt
        dW = np.random.normal(loc=0, scale=np.sqrt(dt), size=Np)
        Z_gaussian[n+1,:] = Z_gaussian[n,:] + np.sqrt(2*D)*dW

    t = np.linspace(0, Tmax, Nt+1)        

    # Calculate variance as a function of time for both cases
    var_gaussian.append(np.var(Z_gaussian, axis=1))
    var_uniform.append(np.var(Z_uniform, axis=1))
    t_list.append(t)

    # Plot particle trajectories
    ax[i,0].plot(t, Z_uniform, c='k')
    ax[i,1].plot(t, Z_gaussian, c='k')
    ax[i,0].set_title(f'Uniform, $\Delta t={dt}$')
    ax[i,1].set_title(f'Gaussian, $\Delta t={dt}$')

plt.tight_layout()
plt.savefig('random_comparison.png', dpi=240)

# Plot variance as a function of time in a separate figure
fig = plt.figure(figsize=(7,4))
plt.plot(t_list[0], var_gaussian[0], label=f'Gaussian, $\Delta t={timesteps[0]}$')
plt.plot(t_list[1], var_gaussian[1], label=f'Gaussian, $\Delta t={timesteps[1]}$')
plt.plot(t_list[0], var_uniform[0], label=f'Uniform, $\Delta t={timesteps[0]}$')
plt.plot(t_list[1], var_uniform[1], label=f'Uniform, $\Delta t={timesteps[1]}$')

plt.plot(t_list[0], 2*D*t_list[0], '--', c='k', label='$2Dt$')
plt.legend()
plt.tight_layout()
plt.savefig('variance.png', dpi=240)

random_comparison

variance

Boorhin commented 10 months ago

Thanks for your comments, lots of food for thoughts and more examples The 1D figure is only the cumulative sum of the random draws in a naive way. I can scale it to the diffusion coefficient but it is the same as take a D of 1/6 and dt=1, basically np.random.normal().cumsum()/3 and np.random.uniform.cumsum() @nordam , I am not sure why the distributions are scaled with dt in your example. I might be wrong but the implementation in opendrift is np.sqrt(2D/dt)np.random.normal(scale=1, size=N) and the uniform is np.sqrt(6D/dt)(2*np.random.random(N)-1). I may have missed some scaling somewhere but the width is 1 or 99.% within 1, not dt. However, it makes a good case of a lot of similarity. I will replot with scale and post that here. I am still looking at the consequences. The use of uniform is mostly due to the fact that it is much faster to draw on a computer but the algorithms have changed now and Gaussian is easier. The big question if I follow you is, should you draw a lot of uniform for 1 gaussian, will it be faster?, will it be more accurate (seems the same from what is said right?) I think that the timestep in my case is important as having small timestep are driving for much longer computations. Also, considering how the ballistic is calculated from the hydrodynamic model, subsampling the model in time makes little sense as there is a nearest neighbour approach in OpenDrift. I know another level of complication but I like to run a lot of particles to be able to have better results, rather than little particles at very high temporal resolution. Or there should be a complete decoupling between the ballistic and the random walks but that's another discussion for opendrift.

nordam commented 10 months ago

If it says np.sqrt(2*D/dt)*np.random.normal(scale=1, size=N), then I would guess this gets multiplied by dt later on, so that it works out to np.sqrt(2*D*dt)*np.random.normal(scale=1, size=N), which is exactly the same as np.sqrt(2*D)*np.random.normal(scale=np.sqrt(dt), size=N).

The reason I prefer to scale the random numbers directly is to make the code look more similar to what you see when you look up numerical schemes of SDEs. In these schemes, there usually appears a variable $\Delta W$, which is supposed to be a Gaussian random number with zero mean and variance $\Delta t$. See for example Kloeden & Platen, pages 305-306 for a description of the Euler-Maruyama scheme, which is the one that is used here.

https://link.springer.com/book/10.1007/978-3-662-12616-5

Some of these papers may also be useful to look at:

https://link.springer.com/article/10.1007/s10236-012-0523-y https://www.sciencedirect.com/science/article/pii/S0025326X19305338 https://arxiv.org/pdf/2010.11890.pdf

nordam commented 10 months ago

Regarding the point about what is faster:

In terms of computer time to generate the random numbers, I ran this test in a jupyter notebook, and it seems that the Gaussian draw takes about 3 times longer than the uniform (2.3 milliseconds against 750 microseconds).

N = 100000
%timeit X = 2*np.random.random(N) - 1
%timeit X = np.random.normal(loc=0, scale=1, size=N)

On the other hand, if the Gaussian random numbers allow you to take somewhat larger timesteps, then perhaps it's faster in terms of computational time for the whole simulation. That would require some testing to find out. On the whole, though, I would guess that the time spent generating random numbers is a pretty small fraction of the simulation time overall.

Boorhin commented 10 months ago

That's from opendrift.models basemodel.py

    def horizontal_diffusion(self):
        """Move elements with random walk according to given horizontal diffuivity."""
        D = self.get_config('drift:horizontal_diffusivity')
        if D == 0:
            logger.debug('Horizontal diffusivity is 0, no random walk.')
            return
        dt = np.abs(self.time_step.total_seconds())
        x_vel = self.elements.moving * np.sqrt(2 * D / dt) * np.random.normal(
            scale=1, size=self.num_elements_active())
        y_vel = self.elements.moving * np.sqrt(2 * D / dt) * np.random.normal(
            scale=1, size=self.num_elements_active())
        speed = np.sqrt(x_vel * x_vel + y_vel * y_vel)
        logger.debug(
            'Moving elements according to horizontal diffusivity of %s, with speeds between %s and %s m/s'
            % (D, speed.min(), speed.max()))
        self.update_positions(x_vel, y_vel)

I assume that's because we are speaking about velocity and not position. Thanks for ref, I will have a closer look.

Boorhin commented 10 months ago

I have made another benchmark of horizontal diffusion trying to follow @nordam method with a scale that resemble what we could use for the plankton modelling we do. I must admit that I don't know how to interpret these results. If we refer to the ideal variance 2Dt, all seem very tight. When looking at closer scale doing abs(2Dt-var) it seems clearer, however I struggle to see how I could interpret it. The large timestep performs better in the long run. The high frequency mixing seems to really generate a lot of divergence for both methods after a certain time. the larger time step seems poorer at the beginning. I would say that I am none the wiser on how to select the right parameter for the diffusion timestep. I link the code for QC and the picture issued

import numpy as np
import matplotlib.pyplot as plt

D=0.1
time_step=[60, 300, 600, 1200,3600]
nb_part=10000
time=38*24*3600 # 30 day particle max
ss=np.random.SeedSequence(12345)
child_seeds=ss.spawn(nb_part)
streams=[np.random.default_rng(s) for s in child_seeds]
gaussians={}
uniforms={}
for dt in time_step:
    size=int(time/dt)
    gaussians[str(dt)]=np.empty((nb_part,size))
    uniforms[str(dt)]=np.empty((nb_part,size))
    for i in range(nb_part):
        gaussians[str(dt)][i]=np.sqrt(2*D)*streams[i].normal (0, dt**.5, size).cumsum()
        uniforms [str(dt)][i]=np.sqrt(2*D)*streams[i].uniform(-(3*dt)**.5,(3*dt)**.5,size).cumsum()
fig, ax= plt.subplots(2,1, sharex=True)
ax[0].plot(np.arange(time), np.arange(time)*2*D, '--', c='k', label='2Dt')
ax[1].plot(0,0, c='k')
for dt in gaussians.keys():
    t=np.arange(0,time, int(dt))
    ax[0].plot(np.arange(0,time, int(dt)), 
         np.var(uniforms[str(dt)], axis=0), 
         '--', label=f'Uniform, $\delta t: {int(int(dt)/60)}min$')
    ax[0].plot(np.arange(0,time, int(dt)), 
        np.var(gaussians[str(dt)], axis=0), 
         '-', label=f'Gaussian, $\delta t: {int(int(dt)/60)}min$')
    ax[1].plot(np.arange(0,time, int(dt)), 
         np.abs(np.var(uniforms[str(dt)], axis=0)-np.arange(0,time, int(dt))*2*D), 
         '--', label=f'Uniform, $\delta t: {int(int(dt)/60)}min$')
    ax[1].plot(np.arange(0,time, int(dt)), 
         np.abs(np.var(gaussians[str(dt)], axis=0)-np.arange(0,time, int(dt))*2*D), 
         '-', label=f'Gaussian, $\delta t: {int(int(dt)/60)}min$')
ax[0].legend(loc='upper left', ncol=3)
fig.suptitle(f'diffusion of {nb_part} particles at {D} m²/s')
pos=[item.get_position()[0] for item in ax[0].get_xticklabels()]
new_lab=np.round(np.array(pos)/3600/24, decimals=2).astype('U5')
ax[1].set_xticklabels(new_lab)
ax[1].set_xlabel('days')
fig.tight_layout()

H_diffusion_evaluation

ChrisBarker-NOAA commented 10 months ago

I"ll try to take a closer look in a bit, but I think one issue may be simple random variation -- you've fixed the seed(s) so will always get the same results, ,but if you let it pick a seed on its own, each run will result in a different answer, and I think which timestep / method results in the most "correct" answer will change -- that seemed to be the case with a quick test I ran.

nordam commented 10 months ago

To me, all those variance plots look perfectly reasonable, and within expected variation. If you increase the particle number, the lines will be a better match for the theoretical expression.

In any case, plotting the variance is just a useful check to ensure that you have scaled things correctly (which it looks like you have). For finding parameters for actual simulations, you need to think about things like the actual diffusivity you will use, the time resolution of the input data, the scales you are interested in, etc.

Boorhin commented 10 months ago

@ChrisBarker-NOAA , if you get the time, that would be very appreciated. Regarding randomness, Numpy recommends since (5-10 years) to use generators when doing random number generation. The idea of this code is that it is reproducible and more random (knowing that all the numbers drawn on a computer are pseudo-random). So the idea here is that you can reproduce the same figure, with the exact same results. It doesn't mean they are not random numbers when drawn. Also, because of the way it is coded in OpenDrift, the draws are made by time-step, implying an interdependence between all particles. Each time-step will have a different seed (because of the using the legacy function random.random) but in the time-step the serie of random numbers will be drawn across all the particles. That's why in this code, each particle has its own seed for the generator, they should be independent from each other. It might seems like a detail but I believe it is the correct way to generate randomness in this particular case. Finally, because of the nature of the simulations I make and the potential consequences, I believe it is essential that it is reproducible. I have actually started a discussion on the subject for a modification of OpenDrift here.

@nordam thanks for the clarification. Although I could read diffusivity in the models I use, the practice that is state-approved is to use 2 constant diffusion parameters (horizontal and vertical) as done in the above figure. This kind of make things not great for vertical mixing but I am not making the regulations and there is also the issue of the scale/resolution we are supposed to achieve. Regarding the scaling, from the link you provided I understand that you have to keep the concentration constant through time? We don't have a priori information on what could be a "good" distribution or structure. Also the computations are pretty long, sometimes several months so I am trying to not waste too much time by scaling the parameters in the most optimal way. An example I think is the Visser 1997 paper that offers to compensate the lack of calculation of the vertical diffusivity gradient by using a lot of small time-steps. So Opendrift offers a mixing every minute by default but the paper is actually scaled from the gradient he uses and I doubt this is applicable to all cases.

Thanks for you inputs, it really helps me getting a clearer idea of the limitations of the work I do and hopefully help getting some ideas on how to better use Opendrift parametrisation.

nordam commented 10 months ago

Just a few points to clarify:

You can set a seed also with the old-fashioned random generators in numpy. If you just call np.random.seed(42) at the start of a code using the legacy method, it should produce exactly the same result every time you run that code. There is no need to set a separate seed for each particle in this case.
Regarding the constant concentration, if you refer to the well-mixed criterion, that is only a test case. If the concentration is constant, it should remain constant. It's a useful test, but if you use constant diffusivity, then it is pretty much certain that this will be satisfied. It's when you get into using spatially variable diffusivity that this might be an issue.
If you have constant diffusivity, then the diffusivity-part of the model should have no problems with fairly long timesteps. (In fact, for constant diffusivity and Gaussian random numbers, even a single, long timestep will exactly produce the Gaussian distribution that is the analytical solution of the diffusion equation in this case).
For constant diffusivity, I would rather think about any other input data, and the spatial resolution of the input data. A typical random step with have a length of $\sqrt{2D \Delta t}$. What is this length for your timestep and your diffusivity? If the random jumps are similar in size to the spatial resolution you care about, then you probably want to reduce the timestep.

Boorhin commented 10 months ago

Thanks, that's very useful. The scaling in particular, I should have thought about it... That clarifies my head.

ChrisBarker-NOAA commented 9 months ago

The idea of this code is that it is reproducible and more random (knowing that all the numbers drawn on a computer are pseudo-random). So the idea here is that you can reproduce the same figure, with the exact same results.

Yes, this is a good practice, ans we use it in PyGNOME -- however, it is a good practice for reproducibility, but not necessarily for validation.

In this case, you not only get a random sequence of steps for the random walk, but ultimately, the final results over time, for all particles, are also [pseudo]random. So the difference between the realized and analytical solutions that you are examining is, in fact, random. When fixcing teh sead, you are looking at only one realization for each set of parameters. What I suspect is that the difference is random as well. IN order to know whether a given timetep (or distribution) systematically results in a closer-to-analytical solution, you'll need to look at more than one realization and see if there is a pattern.

@nordam: the code was already using 10,000 particles -- I think that's plenty -- in fact, I reduced it to 1,000 (I'm impatient) and the results looked very similar (no a quantitative analysis)

In regards to timestep -- I think the timestep is not very important, as long as the number of timesteps is large enough (in my previous investigations "large enough" was pretty small -- on order of 10-20) so for a long run, there should be no need for a smaller diffusion timestep.

But one note on that: as @nordam pointed out earlier, there may be an issue if the random walk pushes a particle out of the domain -- so that could be your criteria for largest allowable diffusion timestep -- and that is easier to reason about with a uniform distribution.

Hopefully more later.

Boorhin commented 9 months ago

Thanks again for your input, I misunderstood the numpy documentation about parallel seed generation and I have corrected my scripts accordingly. I also made a plot to see the scale of the jumps with the values we use which greatly clarified things for me. Could be a thing to add in the documentation for people like me.

scaling_dtVSresolution

Regarding the enforcement of a small time-step of 60s for the vertical diffusion as per Visser 1997 when you have a constant vertical diffusivity, I feel that it is useless extra computations considering that a typical run for me will be 600-900 timesteps minimum and the vertical mesh resolution is moderately poor. That's why I generally override the vertical mixing time. It is the case that maybe the use of constant vertical diffusivity should not enforce extra computations in OpenDrift? Especially when we don't have any way of estimating the diffusivity gradient?

Thanks again

ChrisBarker-NOAA commented 9 months ago

I also made a plot to see the scale of the jumps with the values we use :-)

Now that I think about it, the scale of the jumps should pretty much take care of itself -- the eddy diffusivity is constrained by any boundaries that may exist anyway -- you can't have large eddies in a small space.

Boorhin commented 9 months ago

In OpenDrift the particles bounce on the coast, seabed and surface with the parameters I use. I am just a bit concerned at the moment on how the vertical diffusion is computed with a constant value. Seems like this should be revised but I am not the one making the rules...

ChrisBarker-NOAA commented 9 months ago

I think it's OK for them to bounce -- that's how a no-flow boundary condition is handled -- we just want a HUGE bounce :-)