gempy-project / gempy

GemPy is an open-source, Python-based 3-D structural geological modeling software, which allows the implicit (i.e. automatic) creation of complex geological models from interface and orientation data. It also offers support for stochastic modeling to address parameter and model uncertainties.
https://gempy.org
European Union Public License 1.2
955 stars 232 forks source link

loading large DEMs #230

Closed AndrewAnnex closed 4 years ago

AndrewAnnex commented 4 years ago

Is your feature request related to a problem? Please describe. What strategy is there for simplifying the geometries of very large DEMs? I am loading a 1 mpp 5x20 km DEM raster file that produces a 20 gb numpy mesh file, besides downsampling my DEM which I do not want to do on areas of high information content, is there a way for gempy to smartly downsample or simplify the resulting mesh?

Describe the solution you'd like a transparent way to specify a max in memory or vertex count, maybe a way to load a triangularized mesh after simplification through meshlab? Maybe a quad-tree base LOD loading scheme based on GDAL pyramids of the raster.

Describe alternatives you've considered Loading subsets of my DEM at full resolution, is there a way to combine multiple topographies that are disjoint?

Leguark commented 4 years ago

How many points has the DEM? Is it breaking just when you load it with gdal or when you try to interpolate on it?

With the fast_run flag of theano, gempy can digest up to couple million points to interpolate--depends also on the input so it is difficult to estimate a maximum.

I guess that the solution would be to do some preprocessing in gdal before creating a gempy grid. @elimh do you have any suggestion from the top of your head?

AndrewAnnex commented 4 years ago

The full resolution DEM is ~6849x14648 raster (rounding up),

So far I have made progress by downsampling from 1 meter to 4-meter per pixel resolution which results in a 1712x3662 raster. That file is only 24 mb in size. That said, there is a large portion of the topography that I do not need such high resolution on, but unfortunately, it is an area between two regions that I do want more resolution.

So it would be useful to have some sort of strategy to handle these cases, smaller disjoint meshes of regions I care about could be one approach, or a way to do the mesh simplification outside of numpy/gempy could be another.

I am definitely not that experienced with meshes, so it is just surprising to me how memory inefficient these data types are (a 300mb raster DEM converted into a STL file will be several GB). So I am just not that familiar with best practices in this area.

oh and I am using fast_run, so far I am doing this all on a macbook pro with 16gb of ram, but I also have a beefier desktop to work with that I haven't tested gempy with yet.

Leguark commented 4 years ago

vtk has some decimation algorithms that helps to remove data on the boring areas and keep it on the areas with high details but yes, 100M points is way too much, the way that gempy loops the grid within theano is not very efficient (because automatic differentiation limitations).

If you want still to use so many points in a sequential manner the best you can do is to split the grid in 1M chunks and in each iteration: (i) setting the new chunk of topography (ii)compute a gempy model, (iii) store the solution...

elimh commented 4 years ago

Hmm, the way gempy handles raster formats could definitely be optimized in terms of memory usage, but your file is huge. Do you want to calculate the model in the entire extent of the DEM? Then it might be useful to divide it into smaller pieces for the calculation and then reassemble the results, like Miguel said. Or, if you find an approach to customize the mesh resolution to your needs, you could use the resulting xyz coordinates to define a custom grid.

AndrewAnnex commented 4 years ago

currently I do attempt to use the entire extent of the DEM (I actually initialize the model from the xyz extent of the DEM). But the solution of only loading chunks of the topography I truely care about seems workable, but I have not experimented with customizing the grids yet. I'll read through those docs eventually and see if I can get that working, if not I'll respond here for questions. I'll also look into using the vtk decimation functions on the topography...

AndrewAnnex commented 4 years ago

I'll close this for now and reopen when I have the chance to try out the provided solutions. Thanks!

AndrewAnnex commented 4 years ago

okay thanks again for the suggestions, I am starting to write the workflow but could use some suggestions on the proper ways to do things using the api. so far here are the steps I follow, I can post code if desired

  1. outside of gempy I use gdal_retile to split the previously mentioned 1712x3662 raster into 512x512 chunks.
  2. create the model
  3. call init_data, passing the first chunks' x and y bounds in but using the full DEM Z bounds
  4. call set_interpolator with output geology, fast_compile, compile_theano
  5. Loop through the topography chunks
    1. call set_topography to the filepath of the chunk
    2. set the extent to the bounds of the chunk (same as step 3, updated for each chunk)
    3. call compute_model, disabling compute_mesh for now

hopefully this is the right approach, please let me know if there are faster or easier ways to do this. I also tried to produce a mesh with customized resolution however I ran into memory allocation errors quickly with that approach as the resulting mesh is probably still too dense. however now I have good meshes to use in ParaView after exporting the gempy surfaces so it was worth the effort!

AndrewAnnex commented 4 years ago

okay so I made some progress in understanding by switching to use custom grids and compute_model_at to return a collection of model solutions, now I just need to sort out the best way to combine them into a combined solution object

AndrewAnnex commented 4 years ago

so I looked into this again, I suppose I was not appreciating how the solutions were stored and how the resolution of the topography feeds into the computational cost, I am able to tolerate a much lower 'resolution' (density of vertex placement) in computed meshes from the model so I down-sampled my DEM to 12 meter resolution (from source 1 meter resolution) and was able to compute a models with reasonable voxel counts.

I think the workflow above however should still be investigated, I may for example wish to calculate a model over the full spatial domain at a low resolution, but have greater topographic resolution in small sections (maybe the concept of a 3d section in addition to the 2d sections could be implemented)

Leguark commented 4 years ago

Thank you to delve so much into the problem. The topography module was not written by me so it is difficult for me to help from the top of my head. But if I am not mistaken, the values at the topography--the values at geomodel.grid.topography.values--are stored in geomodel.solutions.geological_map and it is independent of the regular grid.

You can always activate and deactivate grids so you could just set up a topography grid and deactivate the regular grid--same with cross sections.

I am not sure if I am missing something but I think what you mention is doable. In any case, a tutorial notebook with all what you are exploring I am sure it will be of great help to alleviate future suffering of someone else :)

AndrewAnnex commented 4 years ago

@Leguark okay so that is something to look a bit more carefully at, I think my solution almost worked, but loading the DEM tiles as custom_grids and using the compute_model_at function produced tiles of solutions with different dimensions than I expected than if I ran the model against full topography.

once I work it out fully I will definitely post it here for others with similar issues.

AndrewAnnex commented 4 years ago

closed by code in #241