DOI-USGS / ISIS3

Integrated Software for Imagers and Spectrometers v3. ISIS3 is a digital image processing software package to manipulate imagery collected by current and past NASA and International planetary missions.
https://isis.astrogeology.usgs.gov
Other
200 stars 168 forks source link

findFeaturesSegment.py - /tmp fills immediately if WORKDIR not specified due to image segments location change #5655

Open lwellerastro opened 2 days ago

lwellerastro commented 2 days ago

ISIS version(s) affected: I8.2.0RC1 (not yet released update)

Description
The latest unreleased version of the script includes a new parameter WORKDIR which was added to save temporary lists from findimageoverlaps and overlapstats runs. This was originally requested for debugging purposes in other script posts.

The new parameter and directory work when specified, however, the image segments are now being written to this area when they hadn't before. The problem occurs if the user doesn't specify WORKDIR and input/output lists along with the image segments are written to /tmp. Most /tmp areas are very small (cluster nodes and astrovm's have about 9G each for /tmp; my VM has 3G) and gets filled up very quickly when the LROC NAC segments are written there, especially for numerous runs on a cluster, and the process dies almost immediately. A long LROC NAC image can be 0.5G in size, which is then duplicated when segmented. The image being worked on and all of the potentially overlapping images in the FROMLIST are also present, so a lot of space is needed. /tmp is not an appropriate area for these data.

Originally the script wrote the newly made image segments to the location where the script was being run - we need to go back to this please. It was up to the user to remove the files after the script completed and that should remain as well. The physical segments are needed for debugging the script (because the overlap lists point to the segments as do intermediate networks) so they shouldn't just be temporary. They are easy to remove after the fact.

How to reproduce
Run the new script on a moderate sized test set, and don't set new WORKDIR parameter so the /tmp is used. I'll point to something that will should fail by default in the comments.

Possible Solution
The only thing that was requested and that should go into the WORKDIR are the findimageoverlaps FROMLIST and OVERLAPLIST and overlapstats FROMLIST, OVERLAPLIST and TOLIST (not sure this is even created) if different from findimageoverlaps.

The description for WORKDIR is as follows:

             This directory is where any intermediate and temp files are saved to. 
             If this is set to None (default), these files go into the temp directory 
             which is deleted when the program is terminated. Set this if you want to debug 
             a network. 

Perhaps the parameter name should be something new like TEMPDIR, and if the user creates it, it shouldn't be deleted when the program terminates.

Note that the script has a lot of parameters that are reserved for the findfeatures program run including DEBUG and DEBUGLOG, so those can't be used for the temporary area. WORDIR to me says a lot things beyond how it is currently being used and doesn't say "temporary" to me at all. TEMPDIR might not be the best choice either, but I'm not sure what else to recommend since so many other potential words are reserved.

lwellerastro commented 2 days ago

Example of contents of WORKDIR=TEMP:

TEMP/
from_images_segment1.lis
from_images_segment2.lis
M1214450277LE.lev1.segment1.cub
M1214450277LE.lev1.segment2.cub
M1214450277RE.lev1.segment1.cub
M1214450277RE.lev1.segment2.cub
M156671044LE_cubes_ff_1_from_images_segment1_overlap_fromlist.lis
M156671044LE_cubes_ff_1_from_images_segment1.overlaps
M156671044LE_cubes_ff_1_from_images_segment2_overlap_fromlist.lis
M156671044LE_cubes_ff_1_from_images_segment2.overlaps
M156671044LE_cubes_ff_2_from_images_segment1_overlap_fromlist.lis
M156671044LE_cubes_ff_2_from_images_segment1.overlaps
M156671044LE_cubes_ff_2_from_images_segment2_overlap_fromlist.lis
M156671044LE_cubes_ff_2_from_images_segment2.overlaps
M156671044LE.lev1.segment1.cub
M156671044LE.lev1.segment2.cub

It would be preferable to exclude the segmentcub files from this directory.

This particular example does not run into the problem with /tmp getting full if WORKDIR is not used because it only involves 3 images, but I wanted to get something up that was more descriptive.

lwellerastro commented 2 days ago

See an example that fills my /tmp area immediately under my work users area Isis3Tests/FFSegmentScript/Git5655/M108313384RE_Network/.

Here is my command and the output (which is captured in log_failure.out:

./findFeaturesSegment.py algorithm=fast/brief maxthreads=7 match=../Lev1/M108313384RE.lev1.cub fromlist=M108313384RE_fromlist_ff.lis fastgeom=true geomtype=camera maxpoints=10000 epitolerance=9.0 ratio=0.9 hmgtolerance=9.0 filter=sobel networkid=M108313384RE pointid='M108313384RE_ff_?????' onet=M108313384RE_ff.net tolist=M108313384RE_cubes_ff.lis tonotmatched=M108313384RE_notmatched_ff.lis description='Create image-image control network' debug=true debuglog=M108313384RE_ff.log

DEBUG:root:nlines: 30000
DEBUG:root:nlines: 30000
DEBUG:root:nlines: 30000
DEBUG:root:nlines: 30000
DEBUG:root:nlines: 30000
DEBUG:root:nlines: 30000
DEBUG:root:nlines: 30000
Traceback (most recent call last):
  File "/work/users/lweller/Isis3Tests/FFSegmentScript/Git5655/M108313384RE_Network/./findFeaturesSegment.py", line 447, in <module>
    raise e 
    ^^^^^^^
  File "/work/users/lweller/Isis3Tests/FFSegmentScript/Git5655/M108313384RE_Network/./findFeaturesSegment.py", line 443, in <module>
    findFeaturesSegment(ui, workdir) 
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/work/users/lweller/Isis3Tests/FFSegmentScript/Git5655/M108313384RE_Network/./findFeaturesSegment.py", line 341, in findFeaturesSegment
    output = output.get()
             ^^^^^^^^^^^^
  File "/usgs/cpkgs/anaconda3_linux/envs/isis8.3.0/lib/python3.11/multiprocessing/pool.py", line 774, in get
    raise self._value
  File "/usgs/cpkgs/anaconda3_linux/envs/isis8.3.0/lib/python3.11/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
                    ^^^^^^^^^^^^^^^^^^^
  File "/usgs/cpkgs/anaconda3_linux/envs/isis8.3.0/lib/python3.11/multiprocessing/pool.py", line 51, in starmapstar
    return list(itertools.starmap(args[0], args[1]))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/work/users/lweller/Isis3Tests/FFSegmentScript/Git5655/M108313384RE_Network/./findFeaturesSegment.py", line 113, in segment
    shutil.copyfile(img_path, work_img)
  File "/usgs/cpkgs/anaconda3_linux/envs/isis8.3.0/lib/python3.11/shutil.py", line 269, in copyfile
    _fastcopy_sendfile(fsrc, fdst)
  File "/usgs/cpkgs/anaconda3_linux/envs/isis8.3.0/lib/python3.11/shutil.py", line 158, in _fastcopy_sendfile
    raise err from None
  File "/usgs/cpkgs/anaconda3_linux/envs/isis8.3.0/lib/python3.11/shutil.py", line 144, in _fastcopy_sendfile
    sent = os.sendfile(outfd, infd, offset, blocksize)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: [Errno 28] No space left on device: '/work/users/lweller/Isis3Tests/FFSegmentScript/Git5655/Lev1/M1133330211LE.lev1.cub' -> '/tmp/tmp0nextq1m/M1133330211LE.lev1.cub'

Contents of the /tmp area:

ls -1 /tmp/tmpefpk1_ap/
M105939370LE.lev1.cub
M1120292097RE.lev1.cub
M1130970104LE.lev1.cub
M1133330211LE.lev1.cub
M1133330211RE.lev1.cub
M1342772956LE.lev1.cub
M1342772956RE.lev1.cub

du -ksch /tmp/tmpefpk1_ap/
2.4G    /tmp/tmpefpk1_ap/
2.4G    total

I didn't realize the original images were also being copied there. Yikes, that's a lot of duplicated data.

The original cubes require 8.4G and when I rerun using workdir=MyTemp the latter uses 4.5G after the copied lev1.cub are removed, but prior to that it I saw it go up to 6.4G. Either way, it's too much for my 3G /tmp area.