OSGeo / grass

GRASS GIS - free and open-source geospatial processing engine
https://grass.osgeo.org
Other
847 stars 308 forks source link

[Feat] Implement GRASS_RASTER_TMPDIR_MAPSET like it exists for vector data #893

Open neteler opened 4 years ago

neteler commented 4 years ago

Is your feature request related to a problem? Please describe.

Since GRASS GIS is able to process enormous amounts of data, it is important to not slow down unnecessarily the processing. When performing raster processing, a .tmp/ directory is created in the current mapset.

In case that the parent mapset directory is located on a slow (e.g. network) drive this slows down the entire processing.

The current workaround to set a link of location/mapset/.tmp/ to another (fast) drive isn't really user friendly.

Describe the solution you'd like

A desired solution the implementation of support for a new GRASS_RASTER_TMPDIR_MAPSET variable like it already exists for vector data (GRASS_VECTOR_TMPDIR_MAPSET)

This would require to change all hardcoded .tmp/ occurences to the variable, esp. in source code in lib/init/grass.py, lib/gis/open.c, lib/gis/file_name.c, the raster library, etc.

neteler commented 4 years ago

A (cleaned) search for .tmp shows these candidate files in need to be updated:

lib/gis/file_name.c:67:  $LOCATION/$MAPSET/.tmp/$HOSTNAME. If GRASS_VECTOR_TMPDIR_MAPSET is

lib/gis/open.c:62:    is_tmp = (element && strncmp(element, ".tmp", 3) == 0);

lib/gis/tempfile.c:117:    strcpy(element, ".tmp");

lib/vector/Vlib/open.c:582:  <tt>.tmp/<hostname>/vector</tt>).
lib/vector/Vlib/open.c:649:  <tt>.tmp/<hostname>/vector</tt>).
lib/vector/Vlib/open.c:935:  <tt>.tmp/<hostname>/vector</tt>). If the map already exists, it is

lib/raster/rasterlib.dox:1017:Creates a new floating-point raster map (in <tt>.tmp</tt>) and returns
lib/raster/rasterlib.dox:1183:If the map is a new floating point, move the <TT>.tmp</TT> file into
lib/raster/rasterlib.dox:1188:cat = max value (for backwards compatibility). Move the <TT>.tmp</TT>

lib/raster/close.c:86: * If the map is a new floating point, move the <tt>.tmp</tt> file
lib/raster/close.c:92: * the <tt>.tmp</tt> NULL-value bitmap file to the <tt>cell_misc</tt>

lib/init/variables.html:320:  <tt>$LOCATION/$MAPSET/.tmp/$HOSTNAME</tt>. If GRASS_VECTOR_TMPDIR_MAPSET is

lib/init/grass.py:2073:        self.tmp_location = False
lib/init/grass.py:2074:        self.tmp_mapset = False
lib/init/grass.py:2114:            params.tmp_location = True
lib/init/grass.py:2116:            params.tmp_mapset = True
lib/init/grass.py:2123:        if params.tmp_location:
lib/init/grass.py:2136:    if params.tmp_location and params.tmp_mapset:
lib/init/grass.py:2141:    if params.tmp_location and not params.geofile:
lib/init/grass.py:2148:    if params.tmp_location and params.mapset:
lib/init/grass.py:2288:    if not params.mapset and not params.tmp_location:
lib/init/grass.py:2298:        if params.tmp_location:
lib/init/grass.py:2302:                       tmp_location=params.tmp_location, tmpdir=tmpdir)
lib/init/grass.py:2306:        elif params.tmp_mapset:
lib/init/grass.py:2308:                       tmp_mapset=params.tmp_mapset)
lib/init/grass.py:2403:        if not params.tmp_location:

display/d.legend.vect/d.legend.vect.html:51:By default the legend file is stored in grassdata/location/mapset/.tmp/user

display/d.mon/start.c:153:    /* create .tmp/HOSTNAME/u_name directory */

scripts/d.rast.edit/d.rast.edit.html:124:<p>There is no user-interrupt handling. This could leave files in .tmp

vector/v.hull/globals.h:17:#define TMPFILE "voxeltmp.tmp"

gui/wxpython/animation/frame.py:71:        # (stored in MAPSET/.tmp/)

gui/wxpython/gui_core/dialogs.py:2424:        self.tmp_file = grass.tempfile(False) + '.png'
gui/wxpython/gui_core/dialogs.py:2604:        env['GRASS_RENDER_FILE'] = self.tmp_file
gui/wxpython/gui_core/dialogs.py:2609:            self.renderfont.SetBitmap(wx.Bitmap(self.tmp_file))
gui/wxpython/gui_core/dialogs.py:2612:        try_remove(self.tmp_file)
ninsbl commented 4 years ago

Yes, that would be nice indeed, esp. for processes with frequent access of temporary data (e.g. r.watershed). In some cases (e.g. lots of temporary maps, only limited amount of final results), linking is more efficient because temporary data has to be copied to the other disk (vs. moved if on the same disk)... I guess the trick with linking mapsets to temporary GRASS DBs on SSD is not so well known. Maybe adding the possibility of linking to the data catalog, now with multiple GRASS DB supported soon, would help peoploe discover? Would be a different issue though...

metzm commented 3 years ago

See new PR #1786 Note that the workaround to set a link of location/mapset/.tmp/ to another (fast) drive can cause fatal errors because files can not be renamed across mount points. Therefore a new routine had to be added to lib/raster.

Creating the entire .tmp folder on a different location seems thus too dangerous to me. Instead, each lib/module can make use of these new functions and, as before, must take care with cleaning up, renaming or moving temporary data.

The new functions are fairly generic and can be used by modules. E.g. r.watershed could get a new option tmpdir where temporary data should be stored.