davidfrantz / force

Framework for Operational Radiometric Correction for Environmental monitoring
GNU General Public License v3.0
172 stars 50 forks source link

force-level2 fails to delete log #168

Closed bountrisv closed 2 years ago

bountrisv commented 2 years ago

Note: Using an older version of force (and might have to stay there to make runtime experiments comparisons fairer)

Over multiple invocations of force-level2, seemingly randomly, the script will fail after correctly processing the image because of an inability to delete a log file.

ETA: 0s Left: 0 AVG: 0.00s  local:0/1/100%/346.0s
ETA: 0s Left: 0 AVG: 0.00s  local:0/1/100%/346.0s
[2022-02-17 00:05:11,541] {pod_launcher.py:149} INFO - rm: cannot remove '/data/outputs/level2_tmp/cpu-20220216235924': No such file or directory

Is there a way to catch or avoid the error? Is it because I am on a older version of FORCE?

Setup

INPUT/OUTPUT DIRECTORIES

------------------------------------------------------------------------

FILE_QUEUE = /data/outputs/queue_files/queue_0313.txt DIR_LEVEL2 = /data/outputs/level2_ard/ DIR_LOG = /data/outputs/level2_log/ DIR_TEMP = /data/outputs/level2_tmp/

DIGITAL ELEVATION MODEL

------------------------------------------------------------------------

FILE_DEM = /data/outputs/auxillary_data/dem/crete_srtm-aster.vrt DEM_NODATA = -32767

DATA CUBES

------------------------------------------------------------------------

DO_REPROJ = TRUE DO_TILE = TRUE FILE_TILE = /data/outputs/allowed_tiles.txt TILE_SIZE = 30000.000000 BLOCK_SIZE = 3000.000000 RESOLUTION_LANDSAT = 30 RESOLUTION_SENTINEL2 = 10 ORIGIN_LON = -25.000000 ORIGIN_LAT = 60.000000 PROJECTION = PROJCS["ETRS89 / LAEA Europe",GEOGCS["ETRS89",DATUM["European_Terrestrial_Reference_System_1989",SPHEROID["GRS 1980",6378137,298.257222101,AUTHORITY["EPSG","7019"]],TOWGS84[0,0,0,0,0,0,0],AUTHORITY["EPSG","6258"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4258"]],PROJECTION["Lambert_Azimuthal_Equal_Area"],PARAMETER["latitude_of_center",52],PARAMETER["longitude_of_center",10],PARAMETER["false_easting",4321000],PARAMETER["false_northing",3210000],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AUTHORITY["EPSG","3035"]] RESAMPLING = CC

RADIOMETRIC CORRECTION OPTIONS

------------------------------------------------------------------------

DO_ATMO = TRUE DO_TOPO = TRUE DO_BRDF = TRUE ADJACENCY_EFFECT = TRUE MULTI_SCATTERING = TRUE

WATER VAPOR CORRECTION OPTIONS

------------------------------------------------------------------------

DIR_WVPLUT = /data/outputs/auxillary_data/wvdb WATER_VAPOR = NULL

AEROSOL OPTICAL DEPTH OPTIONS

------------------------------------------------------------------------

DO_AOD = TRUE DIR_AOD = NULL

CLOUD DETECTION OPTIONS

------------------------------------------------------------------------

ERASE_CLOUDS = FALSE MAX_CLOUD_COVER_FRAME = 75 MAX_CLOUD_COVER_TILE = 75 CLOUD_BUFFER = 300 SHADOW_BUFFER = 90 SNOW_BUFFER = 30 CLOUD_THRESHOLD = 0.225 SHADOW_THRESHOLD = 0.02

RESOLUTION MERGING

------------------------------------------------------------------------

RES_MERGE = IMPROPHE

CO-REGISTRATION OPTIONS

------------------------------------------------------------------------

DIR_COREG_BASE = NULL COREG_BASE_NODATA = -9999

MISCELLANEOUS OPTIONS

------------------------------------------------------------------------

IMPULSE_NOISE = TRUE BUFFER_NODATA = FALSE

TIER LEVEL

------------------------------------------------------------------------

TIER = 1

PARALLEL PROCESSING

------------------------------------------------------------------------

Multiprocessing options (NPROC, DELAY) only apply when using the batch

utility force-level2. They are not used by the core function force-l2ps.

------------------------------------------------------------------------

NPROC = 1 NTHREAD = 4 PARALLEL_READS = FALSE DELAY = 3 TIMEOUT_ZIP = 30

OUTPUT OPTIONS

------------------------------------------------------------------------

OUTPUT_FORMAT = GTiff OUTPUT_DST = FALSE OUTPUT_AOD = FALSE OUTPUT_WVP = FALSE OUTPUT_VZN = FALSE OUTPUT_HOT = FALSE OUTPUT_OVV = TRUE

++PARAM_LEVEL2_END++```

thielfab commented 2 years ago

Hi bountrisv, how many pods are you running simultaneously? Due to the specs of each pod I assume you want to process 1 image per pod? Not exactly sure what your setup is (and I have little to no knowledge with Kubernetes), but I think in this case you should rather directly use force-l2ps https://force-eo.readthedocs.io/en/latest/components/lower-level/level2/l2ps.html#level2-wrapper As far as I remember force-level2 generates a TXT file (named cpu- + timestamp) which allows you to change the number of processes while the task is running. In your case, if you start a lot of pods at the same time it might actually generate two/multiple files with the same name and hence causes some kind of conflict (assuming the pods are using a common file system). But that's just a guess ...

davidfrantz commented 2 years ago

Yep, pretty sure that is happening. force-level2 deletes that file when it is finished. This program was never designed to be put in another parallel section.

You can

Cheers, David

bountrisv commented 2 years ago

Hey all, thank for your responses!

I run from 10 to 65~ pods. If they can create the same filename that very well sounds like the problem behind it. Thanks for the proposed solutions.

Before I close the issue, one question regarding the last proposed solution:

Does force-l2ps take care of merging images as well, or would that have to be done manually?

davidfrantz commented 2 years ago

Merging will be done, too!

I believe the only thing to really consider is whether to use l2ps or l2ps. Use l2ps if your input images are unpacked. Use l2ps if input images are still packed (zip/tar/tar.gz).

Another small difference that you should be aware of is the handling of the logfiles. force-l2ps simply write to stdout, while force-level2 redirects this stream to a logfile (DIR_LOG).

Cheers, David

bountrisv commented 2 years ago

Thank you for your fast and helpful responses, the issue is caused by the way I am executing force, and replacing force-level2 with force-l2ps works well in my case, so closing.