cogeotiff / rio-cogeo

Cloud Optimized GeoTIFF creation and validation plugin for rasterio
https://cogeotiff.github.io/rio-cogeo/
BSD 3-Clause "New" or "Revised" License
308 stars 42 forks source link

Change folder of "execution" #253

Closed thaarhoff closed 1 year ago

thaarhoff commented 1 year ago

Hi,

in our project, we want to use rio cogeo create daily within a k8s cluster. This is currently set up, by having a cronjob, deploying a custom docker container(unix-based), which is linked to a PVC. Let's say it's mounted to /opt/images From this location the job runs rio cogeo create --cog-profile deflate --blocksize 256 --overview-level 8 --overview-resampling bilinear source_file target_file What we noticed is, during loading the input file, the /tmp directory rapidly increases. Up to a point where the container gets evicted when processing large files.

I couldn't find a suitable configuration option to change the folder to point to the PVC mount at /opt/images. Is this possible somehow?

(Sure - I could change the mount, but the job also is used for other processes which are configured to the current mount and it would be prefered to tell rio cogeo where to run)

Greetings

vincentsarago commented 1 year ago

@thaarhoff is your source_file a local file?

maybe you can try to change CPL_TMPDIR environment variable to point to a specific directory

is. target_file in /tmp/....

by default we will create a temporary file in the same directory as the target destination https://github.com/cogeotiff/rio-cogeo/blob/main/rio_cogeo/cogeo.py#L28-L40

thaarhoff commented 1 year ago

Thanks for the quick response. rio is executed in /opt/images. The source file is also in that folder. The target file would be in a (already existing) subfolder.

/opt/images < execution /opt/images/source_file /opt/images/result/target_file

I'll set the environment variable and come back.

thaarhoff commented 1 year ago

Before rio execution

/# du --max-depth=1
68028460        ./opt
8       ./tmp

echo $CPL_TMPDIR > /opt/geoimages/result

Starting rio Reading input: /opt/geoimages/input.tif and tmp starts rising again.

/# du --max-depth=1
68028460        ./opt
918124  ./tmp
vincentsarago commented 1 year ago

And what's in tmp ?

thaarhoff commented 1 year ago

Before, tmp is empty. There is a tmp-tif file being generated image

And that is not the filename of the target_file

vincentsarago commented 1 year ago

Interesting So the file is created in tmp

Can you make sure that --allow-intermediate-compression is used

thaarhoff commented 1 year ago

I added the option. I mean - it raises slower. Still the file is created in tmp. 3.4GB at the moment.

And evicted at 12GB, since /vda1 is full. While the PVC with 400GB is untouched :D

vincentsarago commented 1 year ago

🤦 I see! I guess in https://github.com/cogeotiff/rio-cogeo/blob/main/rio_cogeo/cogeo.py#L33, is_file() will only return True when we pass a valid (existing) file.

I'll fix this and release a new version 🙏

thaarhoff commented 1 year ago

Thank you for investigating! I'll look forward to it.

vincentsarago commented 1 year ago

the origin bug should be fixed in 3.5.1 (which should be on pypi in couple minutes). I'll open another PR later to add a new option for user to pass a temporary directory

thaarhoff commented 1 year ago

Works. Thank you for the quick fix <3