Open-EO / openeo-geopyspark-driver

OpenEO driver for GeoPySpark (Geotrellis)
Apache License 2.0
26 stars 4 forks source link

use zip files for UDF dependencies #845

Closed JeroenVerstraelen closed 2 weeks ago

JeroenVerstraelen commented 2 months ago

A more permanent solution for https://github.com/eu-cdse/openeo-cdse-infra/issues/112

JeroenVerstraelen commented 1 month ago

To be checked:

JeroenVerstraelen commented 1 month ago

Blocked by fusemount

soxofaan commented 3 weeks ago

I decided to merge #914 . I tried to cover as much as possible with unit testing, but some parts where not feasible (also see #915). I had hoped to also do some real testing on k8s, but it was unclear how much additional time I would need for that (e.g. setting up this skaffold thing). This PR also just adds a new feature as opt-in and should not change the current behavior.

soxofaan commented 3 weeks ago

enabled ZIP based UDF dependency handling on CDSE dev with os_creodias_openeo_k8s/commits/657d38743ecacf73efdc40a5519cf3faa97d615c

now waiting for deploy pipelines to make sure this is in place and can be tested

soxofaan commented 2 weeks ago

integration tests on CDSE seem to be failing on this since https://jenkins.vgt.vito.be/job/openEO/job/openeo-deploy-cdse/2502/ . Investigating

soxofaan commented 2 weeks ago

I hope with CDSE integration tests now pass with these latest commits https://github.com/Open-EO/openeo-geopyspark-driver/commit/50a127a438dfa8327a54e0bdc4e52adc782bb51e and https://github.com/Open-EO/openeo-geopyspark-driver/commit/84aecb7ca2bd7a751a7e0f605257974b209effab

soxofaan commented 2 weeks ago

CDSE integration tests now passed

relevant logs from the related "test_udf_dependency_handling" job j-241030b14656437fa80e80da41ff9600

driver:

Installing Python UDF dependencies with ['/opt/venv/bin/python3', '-m', 'pip', 'install', '--target', '/opt/spark/work-dir/tmp/udfpydeps-pack-zvnufi85/packages', '--progress-bar', 'off', '--disable-pip-version-check', '--no-input', '--retries', '2', '--timeout', '20', 'alabaster==0.7.13']: start 2024-10-30 11:13:51.678563
pip install exited with exit code 0
Installing Python UDF dependencies with ['/opt/venv/bin/python3', '-m', 'pip', 'install', '--target', '/opt/spark/work-dir/tmp/udfpydeps-pack-zvnufi85/packages', '--progress-bar', 'off', '--disable-pip-version-check', '--no-input', '--retries', '2', '--timeout', '20', 'alabaster==0.7.13']: end 2024-10-30 11:13:53.614273, elapsed 0:00:01.935710
Archiving Python UDF dependencies from /opt/spark/work-dir/tmp/udfpydeps-pack-zvnufi85/packages to zip archive /opt/spark/work-dir/tmp/udfpydeps-pack-zvnufi85/archive: start 2024-10-30 11:13:53.614487
Archiving Python UDF dependencies from /opt/spark/work-dir/tmp/udfpydeps-pack-zvnufi85/packages to zip archive /opt/spark/work-dir/tmp/udfpydeps-pack-zvnufi85/archive: end 2024-10-30 11:13:53.617926, elapsed 0:00:00.003439
Copying /opt/spark/work-dir/tmp/udfpydeps-pack-zvnufi85/archive.zip (17183 bytes) to /batch_jobs/j-241030b14656437fa80e80da41ff9600/udf-py-deps.zip

Note how pip install is done to temp dir /opt/spark/work-dir/tmp/udfpydeps-pack-zvnufi85/packages, which is first zipped (in 0.0034s) to other temp location /opt/spark/work-dir/tmp/udfpydeps-pack-zvnufi85/archive and then copied to /opt/spark/work-dir/tmp/udfpydeps-pack-zvnufi85/archive.zip

relevant logs from executor:

run_udf with install_mode='zip' udf_python_dependencies_archive_path='/batch_jobs/j-241030b14656437fa80e80da41ff9600/udf-py-deps.zip'
Extracting Python UDF dependencies from /batch_jobs/j-241030b14656437fa80e80da41ff9600/udf-py-deps.zip to /tmp/udfpypeps-unpack-lrbgture: start 2024-10-30 11:14:45.746446
Extracting Python UDF dependencies from /batch_jobs/j-241030b14656437fa80e80da41ff9600/udf-py-deps.zip to /tmp/udfpypeps-unpack-lrbgture: end 2024-10-30 11:14:45.760857, elapsed 0:00:00.014411
Cleaning up temporary UDF deps at /tmp/udfpypeps-unpack-lrbgture

Note how dependencies are unzipped on the fly from /batch_jobs/j-241030b14656437fa80e80da41ff9600/udf-py-deps.zip to a temp folder tmp/udfpypeps-unpack-lrbgture (in 0.014s) The temp folder is automatically cleaned up after running the UDF

soxofaan commented 2 weeks ago

One concern are these warnings in executor logs:

WARNING  Empty/non-existent UDF_PYTHON_DEPENDENCIES_ARCHIVE_PATH /batch_jobs/j-241030b14656437fa80e80da41ff9600/udf-py-deps.zip

Which is probably related to the S3-FUSE-mount delay issue we are seeing in other places.

After some time the file seems to exist and the UDF works. I guess I should put back the sleep config I removed with os_creodias_openeo_k8s/commits/657d38743ecacf73efdc40a5519cf3faa97d615c

soxofaan commented 2 weeks ago

Pushed some more tweaks to various pipelines.

when these become active I can give this a final test

soxofaan commented 2 weeks ago

Still struggling with the ZIP archive availability in the executors. Archive is created succesfully in driver, e.g. at /batch_jobs/j-24103037146945588a44e866e66c11bd/udf-py-deps.zip , but 30 seconds later that is still not available on executors:

Image

soxofaan commented 2 weeks ago

bumped sleeping to 30s in os_creodias_openeo_k8s/commits/2f76a7dc4c0a67a1d717c61aaee8ef7cd9c75c6 which seems to make the integrations tests pass apparently

soxofaan commented 2 weeks ago

I considered to make the sleeping smarter (exponential backoff or something alike), but maybe that's for another ticket:

soxofaan commented 2 weeks ago

did a final manual test against openeo.dev.warsaw.openeo.dataspace.copernicus.eu: j-241031f4d3254949b3f7704cd386eaab

here are relevant logs from kibana: image

verifies that it works, so time to close this ticket