gempy-project / gempy

GemPy is an open-source, Python-based 3-D structural geological modeling software, which allows the implicit (i.e. automatic) creation of complex geological models from interface and orientation data. It also offers support for stochastic modeling to address parameter and model uncertainties.
https://gempy.org
European Union Public License 1.2
934 stars 231 forks source link

advice on improving ill-conditioned matrices #105

Closed cfandel closed 5 years ago

cfandel commented 5 years ago

I am adding data to refine a GemPy model of a field site, but most of the combinations of additional data points that I try result in a scipy warning that the resulting matrix is ill-conditioned. I've tried both adding many more data points, and trimming down to minimal datasets, but I still get this problem. Any tips on how to select data to avoid this problem? Or is it OK to ignore the warning? The resulting model looks reasonable, even when the matrix is ill-conditioned.

cfandel commented 5 years ago

PS: example notebook here https://github.com/cfandel/gottesacker/blob/master/gott_v5.ipynb see section 43

javoha commented 5 years ago

Hi, it might be possible to get ill-conditioned matrices in the universal cokriging interpolation when your data overlaps or is located very close to each other. Maybe check if that is the case for the data from your different data sets. As the condition number is only a measurement of the accuracy of the solution of the system, it seems correct that your results still look reasonable.

cfandel commented 5 years ago

Hm so is it the distance of data points from each other in XYZ space, or only in XY space that matters? Because I'm using a lot of cross-section data where I have multiple interface points at different elevations for different formations that are stacked vertically on the same XY point. Would that be likely to cause problems? Thanks!

javoha commented 5 years ago

Only the total distance in XYZ space matters, if the elevation for the different layers differ that is fine. I was just thinking if you have multiple data sets, there might be data containing nearly the same information (meaning very close XYZ coordinate and same layer boundary), which might cause the problem.

ejhgeo commented 5 years ago

I have also had problems with ill-conditioned matrices for the same reason. So far I have interface data from two seismic reflection profiles crossing a subduction zone and a grid of bathymetry data defining the seafloor. As of now the structure model that GemPy interpolates is total crap and looks nothing like what we know these cross-sections should look like. I do probably have some overlapping points that I can try to get rid of. I'm also assuming that in addition to having interface points stacked at different elevations under the same XY point, the disparity between my lateral and depth extents could be contributing to singular values? For instance, my cross-sections are roughly 200 km long but the maximum depth right now is only 8 km.

Also, the top layer of my model is the ocean, so obviously my fault (the plate boundary) should not penetrate into the water, but GemPy doesn't know this, so the model gets screwed up and the fault shows up more like a layer than a fault. So, my questions are:

1) Would adding deeper points (I have yet to include the Moho for instance) help prevent the ill-conditioning? What else can I do to prevent this? 2) Can GemPy handle the spatial extent and complexity of a subduction zone? 3) How do I prevent GemPy from extending the fault into the ocean layer? (Also, the seafloor interface doesn't even show up in the model even though it has the largest number of interface points.)

I feel like it should be pretty easy, with the density of points I have, to interpolate a surface (or at least a line along these 2D profiles and a surface for the seafloor grid) from this data, yet GemPy is not producing any reasonable results. Any help is appreciated.

javoha commented 5 years ago

Hi,

the first thing you might want to check, is if your series are set correctly. For example the seafloor should be in an own series to make sure it does not affect the location of the other layers. As you said, removing duplicate points or points very close to each other might improve the conditioning of the matrices. Is the data confidential or can you provide a screenshot of what the model lookls like and what it should look like? Do you use any orientation data at all. Try adding a single orientation with roughly normal direction for each surface.

ejhgeo commented 5 years ago

My seafloor is set as its own series. Should the fault be defined above or below this? In the docs it says faults have to be define at the top of the pile, but if I don't want the fault to penetrate the ocean, then shouldn't it be defined below the seafloor?

I have at least one orientation (which are completely estimated btw) per Series (otherwise gempy won't even run). I was assuming orientations defined the direction of dip of the interface. Are you saying they should define the direction normal to the interface?

This is what gempy gives my as a cross-profile: profile

This is what gempy gives in 3D view:

3d view

This is what the structure should look like for that cross-profile (note that this model goes deeper than the data I have in GemPy right now). But GemPy should at least be getting this shallow structure, especially the seafloor correct. AGU_Figure6_Density_MCS01_Model8_AbsDensity.pdf

javoha commented 5 years ago

We are currently working on fault behavior, so that it will be possible to order faults and series correctly - faults will then only offset older units and not affect younger ones (or in your case the seafloor). The release is planned for the end of this month or early april. Regarding orientations: Entering the dip direction is part of it. Your orientations are set correctly, if they appear as surface normals in the plot (which they do not in your case). Here is an example of how it should look like, including the orientation input:

example orientations

The cross-section you provided is based on densities, but from looking at the legend of the GemPy plot I assume that both the area on the left and on the right side of the subduction zone are different series (which would be correct).

ejhgeo commented 5 years ago

I have put together a synthetic data set that is a much simpler version of a subduction zone. It's cross-section look something like this (blue is seafloor interface, green is moho interface, red is the fault interface): Cross-section-schematic I have my orientations defined as in the example above (with dip given as the normal to the slope of the interface measured from 0 degrees). Sometimes the orientations show up correctly as vectors and sometimes they show up as those hexagonal points, even without changing anything in the orientation data.

With this synthetic data set, I am not getting the ill-conditioned warning, but the model makes no sense:

The two plates are defined as separate series. The seafloor is its own series. The fault is its own series. The seafloor doesn't even show up in the model and the layers are in totally the wrong places: ModelResult_slice35

This is just a really simple block model using two cross profiles at data. GemPy should be able to do this. What am I doing wrong here? I've attached my synthetic data points and orientations and my code as it stands right now.
synthetic_data_GemPy_Input_Points.txt Synthetic_Data_Orientations2.txt

import sys, os
import gempy as gp
get_ipython().run_line_magic('matplotlib', 'inline')
import numpy as np
import matplotlib.pyplot as plt
import pandas as pn

path = "Users/ehightower/Documents/Research/SISIE"
os.chdir("/")
os.chdir(path)
get_ipython().system('pwd')

interfaces_file = "synthetic_data_GemPy_Input_Points.csv" #"MCS01_Horizons_GemPyInput2.csv"
orientations_file = "Synthetic_Data_Orientations.csv"
data = pn.read_csv(interfaces_file)
padding = 200000
minX = -50000 #data['X'].min() - padding
maxX = data['X'].max() + padding
minY = 0 # data['Y'].min() - padding
maxY = data['Y'].max() + padding
minZ = data['Z'].min() - 10000
maxZ = 0
data.head()

res = 100
geo_data = gp.create_data([minX,maxX,minY,maxY,minZ,maxZ],[100,100,100],
                          path_i = os.pardir+"/SISIE/"+interfaces_file,
                          path_o = os.pardir+"/SISIE/"+orientations_file)

geo_data.interfaces
geo_data.interfaces.info()

geo_data.grid.values

gp.get_data(geo_data)

# For Synthetic Data
gp.set_series(geo_data, {"Sea_Series":'sea',
                         "Fault_Series":'decoll',
                         "AU_Series":'mohoAU',
                         "PA_Series":'mohoPA'},
              order_series = ["Sea_Series","Fault_Series","PA_Series","AU_Series"],
              order_formations = ['sea','decoll','mohoPA','mohoAU'],
              verbose = 0)

# For Real Data
gp.set_series(geo_data, {"Fault_Series":'decoll',  
                         "Sea_Series":'sea',                
                         "PA_Seds_Series":('seds1_PA','seds2_PA','seds3_PA'),
                         "AU_Seds_Series":('seds1_AU','seds2_AU','seds_trench_AU')},
              order_series = ["Fault_Series","Sea_Series","PA_Seds_Series","AU_Seds_Series"],
              order_formations = ['decoll','sea','seds1_PA','seds2_PA','seds3_PA','seds1_AU','seds2_AU','seds_trench_AU'],
              verbose = 0)

get_ipython().run_line_magic('matplotlib', 'inline')
gp.get_sequential_pile(geo_data)

print(gp.get_grid(geo_data).values)

gp.get_data(geo_data, 'interfaces').head()

gp.get_data(geo_data, 'orientations').head()

get_ipython().run_line_magic('matplotlib', 'inline')
#plt.figure()
gp.plotting.plot_data(geo_data, direction='y')#, data_type='All', series='All')
#plt.show()

interp_data = gp.InterpolatorData(geo_data,
                                  output='geology', compile_theano=True,
                                  theano_optimizer='fast_compile',
                                  verbose=[])

gp.get_formation_numbers(interp_data)
interp_data.geo_data_res.get_formation_number()

gp.get_kriging_parameters(interp_data)

lith_block, fault_block = gp.compute_model(interp_data)

gp.plot_section(geo_data, lith_block[0], cell_number=25, direction='y', plot_data=True)

gp.plot_scalar_field(geo_data, lith_block[0], cell_number=25, N=25, direction='y', plot_data=True)

gp.plot_section(geo_data, fault_block[0], cell_number=25, direction='y', plot_data=True)

gp.plot_scalar_field(geo_data, fault_block[1], cell_number=25, N=15, directio='y', plot_data=True)

ver, sim = gp.get_surfaces(interp_data, lith_block[1], fault_block[1], original_scale=True)

gp.plot_surfaces_3D(geo_data, ver, sim, alpha=1)

ALSO: interp_data.geo_data_res.get_formation_number() returns the error "'InputData' object has no attribute 'get_formation_number'". Could this be contributing to the problem?

Could having all Z values negative be affecting the order of the Series? Do I need to define my series in the opposite sense because Z is negative?

I've padded the model by extending the extent past the min and max values of the data. Without doing that my two profiles were on the edges, which I figured would be affecting the boundary. How much should we extend the model past the extent of the data? And would this significantly affect the interpolation?

Is there nothing that can be done at this point to prevent the fault from penetrating the ocean layer? or would I have to wait a couple months?

Thank you

Leguark commented 5 years ago

Hi there! Interesting model case you have there. Right now we are preparing a major release where we change many of the problem you are dealing with so by now I do not want to spend to much time patching these features in the master.

But not all are bad news. We have taken your model as an example case for the new release (right now we are working on the branch gempy1.2-alpha). I don't like to give dates because I am always too optimistic but hopefully next week you could clone that branch and going on working there.

We have been discussing what we need to model what you have there and it should be done with uncomfortmities. However we need to add the onlap/erode relation between unconformties (I will try to implement it this week). The ill condition matrices should be fixed when a better definition of the surfaces series. Also your problem with the basement will be solved in gempy 1.2 #83

I am sorry for having you waiting but I really hope that the new version is worth the pain in terms of user experience.

Leguark commented 5 years ago

A bit later than expected. It is still in alpha but onlap/erode is implemented. If you need it right now you can find the code in https://github.com/cgre-aachen/gempy/blob/gempy1.2-NewTheano/notebooks/tutorials/ch1-8_Onlap_relations.ipynb

Keep in mind that you need to use that branch to run the code. We are almost done with new features so this will move to the main branch soon

image

Leguark commented 5 years ago

This model is part of the tutorials of gempy 2.0