Closed AndrewAnnex closed 4 years ago
How many points has the DEM? Is it breaking just when you load it with gdal or when you try to interpolate on it?
With the fast_run flag of theano, gempy can digest up to couple million points to interpolate--depends also on the input so it is difficult to estimate a maximum.
I guess that the solution would be to do some preprocessing in gdal before creating a gempy grid. @elimh do you have any suggestion from the top of your head?
The full resolution DEM is ~6849x14648 raster (rounding up),
So far I have made progress by downsampling from 1 meter to 4-meter per pixel resolution which results in a 1712x3662 raster. That file is only 24 mb in size. That said, there is a large portion of the topography that I do not need such high resolution on, but unfortunately, it is an area between two regions that I do want more resolution.
So it would be useful to have some sort of strategy to handle these cases, smaller disjoint meshes of regions I care about could be one approach, or a way to do the mesh simplification outside of numpy/gempy could be another.
I am definitely not that experienced with meshes, so it is just surprising to me how memory inefficient these data types are (a 300mb raster DEM converted into a STL file will be several GB). So I am just not that familiar with best practices in this area.
oh and I am using fast_run, so far I am doing this all on a macbook pro with 16gb of ram, but I also have a beefier desktop to work with that I haven't tested gempy with yet.
vtk has some decimation algorithms that helps to remove data on the boring areas and keep it on the areas with high details but yes, 100M points is way too much, the way that gempy loops the grid within theano is not very efficient (because automatic differentiation limitations).
If you want still to use so many points in a sequential manner the best you can do is to split the grid in 1M chunks and in each iteration: (i) setting the new chunk of topography (ii)compute a gempy model, (iii) store the solution...
Hmm, the way gempy handles raster formats could definitely be optimized in terms of memory usage, but your file is huge. Do you want to calculate the model in the entire extent of the DEM? Then it might be useful to divide it into smaller pieces for the calculation and then reassemble the results, like Miguel said. Or, if you find an approach to customize the mesh resolution to your needs, you could use the resulting xyz coordinates to define a custom grid.
currently I do attempt to use the entire extent of the DEM (I actually initialize the model from the xyz extent of the DEM). But the solution of only loading chunks of the topography I truely care about seems workable, but I have not experimented with customizing the grids yet. I'll read through those docs eventually and see if I can get that working, if not I'll respond here for questions. I'll also look into using the vtk decimation functions on the topography...
I'll close this for now and reopen when I have the chance to try out the provided solutions. Thanks!
okay thanks again for the suggestions, I am starting to write the workflow but could use some suggestions on the proper ways to do things using the api. so far here are the steps I follow, I can post code if desired
hopefully this is the right approach, please let me know if there are faster or easier ways to do this. I also tried to produce a mesh with customized resolution however I ran into memory allocation errors quickly with that approach as the resulting mesh is probably still too dense. however now I have good meshes to use in ParaView after exporting the gempy surfaces so it was worth the effort!
okay so I made some progress in understanding by switching to use custom grids and compute_model_at to return a collection of model solutions, now I just need to sort out the best way to combine them into a combined solution object
so I looked into this again, I suppose I was not appreciating how the solutions were stored and how the resolution of the topography feeds into the computational cost, I am able to tolerate a much lower 'resolution' (density of vertex placement) in computed meshes from the model so I down-sampled my DEM to 12 meter resolution (from source 1 meter resolution) and was able to compute a models with reasonable voxel counts.
I think the workflow above however should still be investigated, I may for example wish to calculate a model over the full spatial domain at a low resolution, but have greater topographic resolution in small sections (maybe the concept of a 3d section in addition to the 2d sections could be implemented)
Thank you to delve so much into the problem. The topography module was not written by me so it is difficult for me to help from the top of my head. But if I am not mistaken, the values at the topography--the values at geomodel.grid.topography.values--are stored in geomodel.solutions.geological_map and it is independent of the regular grid.
You can always activate and deactivate grids so you could just set up a topography grid and deactivate the regular grid--same with cross sections.
I am not sure if I am missing something but I think what you mention is doable. In any case, a tutorial notebook with all what you are exploring I am sure it will be of great help to alleviate future suffering of someone else :)
@Leguark okay so that is something to look a bit more carefully at, I think my solution almost worked, but loading the DEM tiles as custom_grids and using the compute_model_at
function produced tiles of solutions with different dimensions than I expected than if I ran the model against full topography.
once I work it out fully I will definitely post it here for others with similar issues.
closed by code in #241
Is your feature request related to a problem? Please describe. What strategy is there for simplifying the geometries of very large DEMs? I am loading a 1 mpp 5x20 km DEM raster file that produces a 20 gb numpy mesh file, besides downsampling my DEM which I do not want to do on areas of high information content, is there a way for gempy to smartly downsample or simplify the resulting mesh?
Describe the solution you'd like a transparent way to specify a max in memory or vertex count, maybe a way to load a triangularized mesh after simplification through meshlab? Maybe a quad-tree base LOD loading scheme based on GDAL pyramids of the raster.
Describe alternatives you've considered Loading subsets of my DEM at full resolution, is there a way to combine multiple topographies that are disjoint?