exuberant / mdenoise

GNU General Public License v2.0
24 stars 6 forks source link

Maximum file size is limited by RAM. #1

Open volcan01010 opened 8 years ago

volcan01010 commented 8 years ago

Trying to denoise a large file (e.g. 16 million points) on my desktop computer (4Gb RAM) causes the algorithm to crash with a message similar to the following:

jsteven5@affric:/media/jsteven5/data/data/thingy/LiDAR/raw$
mdenoise -i thingy_LiDAR.xyz -o thingy_LiDAR_DN.xyz -z -n 5 -t 0.9
Input File: thingy_LiDAR.xyz
Neighbourhood: Common Vertex
Threshold: 0.900000
n1: 5
n2: 50
Read Model...
Triangulation...
    77.250 seconds

Error malloc:  Out of memory.
The model data is too big.

Can the algorithm be modified to use the disk in cases when the dataset is too large for RAM? @exuberant may have some suggestions.

volcan01010 commented 8 years ago

Link to large .xyz file (96 Mb): thingy_LiDAR.xyz

stefanocudini commented 7 years ago

hi @volcan01010 may be 4Gb of RAM are too few for any processing of raster GIS data

try to split raster before and join it after elaboration

miccoh1994 commented 6 years ago

Hi,

I am having the same issue. However I'm working on a machine with 128GB of RAM

I am working on Windows 10

File size is 250mb

dvalters commented 5 years ago

Potential non-code re-writing workarounds: It should use all available virtual memory space if the vectors/arrays are heap allocated. (Assuming they must be if the array size is allocated dynamically?) It's not limited by the amount of physical RAM the computer has, but the total contiguous available space in virtual memory. If physical RAM is used up, it should spill over onto a swap partition on disk, but this is Linux OS or implementation dependent, not algorithm-dependent/a feature of C. (But it will be incredibly slow using the swapdisk RAM...)

You can check if there has been a virtual memory limit set in linux with ulimit -v. (I can't remember the Windows options to set this off the top of my head, but similar options are available.) If there are any stack-allocated arrays, the stack size on linux can be viewed or set with ulimit -s.

Edit: In Windows: Control Panel > System > Advanced system settings > Advanced > Performance > Settings... > Advanced > Virtual memory > Change...

zdila commented 4 years ago

I have the same problem so I tried to split big file with gdal_retile.py, denoise every tile separately and then merge it together with gdal_merge.py. Unfortunately edges of glued tiles are not smooth:

image

After gdaldem hillshade:

image

zdila commented 4 years ago

gdal_retile.py with -overlap 40 helped. After denoising I am cropping the tiles with the following script:

#!/bin/bash
read x y <<< $(gdalinfo $1 | grep 'Size is' | tr -c [:digit:] ' ')
gdal_translate -srcwin 20 20 $((x - 40)) $((y - 40)) $1 $2

It is all part of a Makefile and as a bonus I can parallellize denoising (make -j 24).

volcan01010 commented 4 years ago

Hi @zdila,

Unfortunately, MDenoise hasn't been in active development for some time. Recently, WhiteboxTools added a Feature Preserving Smoothing algorithm that is based upon MDenoise. Perhaps you can have more luck with that:

https://jblindsay.github.io/wbt_book/available_tools/geomorphometric_analysis.html?highlight=denoise#featurepreservingsmoothing

I hope that helps.

zdila commented 4 years ago

@volcan01010 thanks for the info. I am actually successful with using mdenoise (see my comment https://github.com/exuberant/mdenoise/issues/1#issuecomment-650628286) but I will give it a try if it produces a better result.