GeostatsGuy / GeostatsPy

GeostatsPy Python package for spatial data analytics and geostatistics. Started as a reimplementation of GSLIB, Geostatistical Library (Deutsch and Journel, 1992) from Fortran to Python, Geostatistics in a Python package. Now with many additional methods. I hope this resources is helpful, Prof. Michael Pyrcz
https://pypi.org/project/geostatspy/
MIT License
479 stars 183 forks source link

Correlated log-normally distributed random field #10

Closed hodaj closed 5 years ago

hodaj commented 5 years ago

Is it possible to generate a correlated log-normally random field using GeoStatTools? If so would you please give me some hints?

geomodeller commented 5 years ago

Hi,

So you want to have a pair of two correlated fields whose distributions follow log-normal? Am I correct? If so, I suggest you to apply co-kriging or co-simulation: 1) Take a log to your data, if any (Kriging works with normal distribution) 2) Generate one random field with Kriging - use GeostatsPy.krige (for a single model) or GeostatsPy.sgsim (for multiple models) 3) Generate another random field with co-kriging - also use GeostatsPy.krige or GeostatsPy.sgsim, but you need to define 2nd data (i.e., the result of step (2)) and the correlation coefficient 4) Take exponential to the both results

I also recommend to refer to Dr. Pyrcz's workflow: https://github.com/GeostatsGuy/GeostatsPy/blob/master/working/sgsim_alpha3.ipynb

Hope this will be helpful for you.

hodaj commented 5 years ago

Hi, Thanks for the prompt response. No I just want to make one spatially correlated random field with log-normal distribution. From your comment, I would guess I have to create a correlated normal field and then transform it to a lognormal one. The problem is that the new field (the log-normal one) will not have the specified correlation length. So the question is if there is a way than I can generate log-normally distributed random fields with a certain mean, variance and correlation length?

Cheers, Hoda

geomodeller commented 5 years ago

Oh, my bad. So you want to have a (spatially) correlated field which should follow log-normal. This is a really interesting question... I haven't thought about that before deeply.

Since 'kiging' and 'SGS' estimate the value in an unknown location with a linear combination of known values, according to the central limit theorem, it is mathematically sound for us to work with a variable which is normally distributed. If a random variable X follows log-normal then the logarithm of X is normally distributed. Therefore, we can generate a field of a new random variable Y (= log(X)) and then transform it back to log-normal distribution by taking exponential to Y at the end.

And let's say the mean and variance of X are 'X_mean' and 'X_var', respectively. Then Y_mean = log(X_mean/sqrt(1+X_var/X_mean^2)) and Y_var = log(1+X_var/X_mean^2). Therefore, we can set the mean and variance of Y using the above relation and if so, your final field will follow the specific mean and variance you set. (Refer to https://en.wikipedia.org/wiki/Log-normal_distribution#Notation)

The correlation length is the maximum distance of correlation. If two variables (the values at a different location) are only correlated within such distance, regardless of their distribution, I believe that their maximum correlation length should be same. Simply X_cor_length = Y_cor_length. (But I am not 100% sure about this. Please do double check)

Thank you, Honggeun

MuellerSeb commented 5 years ago

Hey there, I think hoda is talking about random field Generation, which differs from kriging, since it's not an interpolation of given data. An option is given under the attached issue based on GSTools: https://github.com/GeoStat-Framework/GSTools/issues/26

Cheers and keep up the good work, Sebastian

geomodeller commented 5 years ago

Thank you for clarifying Sebastian!

Yea, definitely with 'sequential Gaussian simulation (SGS)', we can generate multiple random fields. SGS is based on kriging but it 1) sequentially generates values with Monte-Carlo simulation and 2) the computed values are considered 'known' value for the next step. If you want to know more about this, check this link: https://youtu.be/3cLqK3lR56Y. This is a lecture from Dr. Pyrcz's geostatisitcs course.

I also wrote down workflow using geostatspy package. Here you can only change the 'mean_log'/'variance log' and 'Cor_length', as you wish.

import pandas as pd
import matplotlib.pyplot as plt
import geostatspy.geostats as geostats
import geostatspy.GSLIB as GSLIB
import numpy as np
import seaborn as sns

## Define the size of map:
# - nx / ny: number of grid cells in x and y direction
# - xsiz / ysiz : size of individual grid cell
nx = 100; ny = 100; xsiz = 1.0; ysiz = 1.0; 

## Define statistics we goal:
# Cor_length : maximum distance where two values are correalated 
# Mean_log/Var_log : mean and variance of a random variable (X) which follows log-normal dist.
# Mean_normal/Var_log : mean and varinace of a random variable (Y = log(X)) which follows normal dist. 
Cor_length = 100;
Mean_log = 1; Var_log = 1
Mean_normal = np.log(Mean_log/(1+Var_log/Mean_log**2)**(1/2)); 
Var_normal = np.log(1+Var_log/Mean_log**2)

## Define variogram model based on the above statistics:
vario = GSLIB.make_variogram(nug=0.0,nst=1,it1=1,cc1=1,azi1=91.0,hmaj1=Cor_length,hmin1=Cor_length)

## Keep this default:
xmn = xsiz/2; ymn = ysiz/2; 
nxdis = 1; nydis = 1
ndmin = 0; ndmax = 20; radius = Cor_length; ktype = 0; skmean = 0
tmin = -999; tmax = 999

## Feed first three variables from MCS:
df = {'X':[1,50,99],'Y':[1,50,99],'Value':np.random.normal(size= 3)}
df = pd.DataFrame(df)

## Sequntial Gaussian Simulation:
sim_normal = geostats.sgsim(df,'X','Y','Value',wcol=-1,scol=-1,tmin=tmin,tmax=tmax,itrans=1,ismooth=0,dftrans=0,tcol=0,twtcol=0,
            zmin= -1.5,zmax= 1.5,ltail=1,ltpar=0.3,utail=1,utpar=0.3,nsim=1,
            nx=nx,xmn=xmn,xsiz=xsiz,ny=ny,ymn=ymn,ysiz=ysiz,seed=73073,
            ndmin=ndmin,ndmax=ndmax,nodmax=20,mults=0,nmult=2,noct=-1,radius=radius,radius1=1,sang1=0,
            mxctx=10,mxcty=10,ktype=ktype,colocorr=0.0,sec_map=0,vario=vario)
sim_normal = (sim_normal-sim_normal.flatten().mean())/sim_normal.flatten().std()*Var_normal**(1/2)+Mean_normal
sim_log = np.exp(sim_normal)

## Visualize the results:
plt.imshow(sim_normal); plt.colorbar()  # Random field in normal 
plt.imshow(sim_log); plt.colorbar()     # Random field in log-normal
sns.distplot(sim_normal.flatten())      # Histogram of normal field
sns.distplot(sim_log.flatten())         # Histogram of log-normal field

You may get the following results: 1) Log-normal field log_normal 2) Histogram of the log-normal field log_normal2

And by changing the random seed in the function, you can easily generate multiple cases.

Hope this may help you :) Honggeun

geomodeller commented 5 years ago

I just refereed to Dr. Pyrcz code and only revised its scale and name of variables for you to understand better.

hodaj commented 5 years ago

Thanks Honggeun. This is very helpful.