davidfrantz / force

Framework for Operational Radiometric Correction for Environmental monitoring
GNU General Public License v3.0
172 stars 50 forks source link

[force-l2ps] High number of opened files by the force-l2ps process #314

Closed thielfab closed 7 months ago

thielfab commented 7 months ago

It seems that force-l2ps opens quite some files while running, which does not surprise me, as I use VRT files covering the whole datacube area for the DEM and the Landsat NIR coreg base image. So what? Well, my system admin called me because he noted that I am seemingly putting quite some stress on the file system :smiley:

So I did a test with a single instance of force-l2ps (i.e., processing one Sentinel-2 image). I tracked the opened files in 30s steps and logged everything (lsof -r 30 -u myusername).

From a baseline of 160 it goes up to a maximum of 741 opened files for the force-l2ps process. At this point there are some >500 entries of *.tif files from the DEM. This strikes me, as this is already half of the whole study area. Shouldn't the amount of images required be much smaller, around ~9 or something (I am using the 1° COPERNICUS DEM tiles)? In the co-registration step, the number of opened files increases only from 160 to 180, and those 20 files are the tifs from the COREG_BASE VRT file. So here it seems only the extent needed is also loaded/accessed.

I guess my question is, whether my DEM VRT is wrong or faulty, or if FORCE indeed needs such a huge extent of the DEM for one S2 scene?

Used version: FORCE v. 3.7.12

davidfrantz commented 7 months ago

Hi Fabian,

the same function is used to warp the DEM and base image, and both use the image under consideration as target...

As this heavily relies on the GDAL API, I fear that I can't really answer this... All I can say is, that FORCE only opens the vrt. What GDAL does is likely out of my control.

However, isn't your system caching the DEM if you access it multiple times in batch processing?

Cheers

thielfab commented 7 months ago

the same function is used to warp the DEM and base image, and both use the image under consideration as target..

Well, then it is as you say, probably GDAL and we have to assume it works how it's supposed to. But still weird that in case of the DEM so many files are opened. Projected vs unprojected CRS issue? Anyway ... It turned out my jobs were most likely not the cause for the filesystem issues, so I just continue with my workflow 👍