Closed JeroenVerstraelen closed 2 weeks ago
To be checked:
Blocked by fusemount
I decided to merge #914 . I tried to cover as much as possible with unit testing, but some parts where not feasible (also see #915). I had hoped to also do some real testing on k8s, but it was unclear how much additional time I would need for that (e.g. setting up this skaffold thing). This PR also just adds a new feature as opt-in and should not change the current behavior.
enabled ZIP based UDF dependency handling on CDSE dev with os_creodias_openeo_k8s/commits/657d38743ecacf73efdc40a5519cf3faa97d615c
now waiting for deploy pipelines to make sure this is in place and can be tested
integration tests on CDSE seem to be failing on this since https://jenkins.vgt.vito.be/job/openEO/job/openeo-deploy-cdse/2502/ . Investigating
I hope with CDSE integration tests now pass with these latest commits https://github.com/Open-EO/openeo-geopyspark-driver/commit/50a127a438dfa8327a54e0bdc4e52adc782bb51e and https://github.com/Open-EO/openeo-geopyspark-driver/commit/84aecb7ca2bd7a751a7e0f605257974b209effab
CDSE integration tests now passed
relevant logs from the related "test_udf_dependency_handling" job j-241030b14656437fa80e80da41ff9600
driver:
Installing Python UDF dependencies with ['/opt/venv/bin/python3', '-m', 'pip', 'install', '--target', '/opt/spark/work-dir/tmp/udfpydeps-pack-zvnufi85/packages', '--progress-bar', 'off', '--disable-pip-version-check', '--no-input', '--retries', '2', '--timeout', '20', 'alabaster==0.7.13']: start 2024-10-30 11:13:51.678563
pip install exited with exit code 0
Installing Python UDF dependencies with ['/opt/venv/bin/python3', '-m', 'pip', 'install', '--target', '/opt/spark/work-dir/tmp/udfpydeps-pack-zvnufi85/packages', '--progress-bar', 'off', '--disable-pip-version-check', '--no-input', '--retries', '2', '--timeout', '20', 'alabaster==0.7.13']: end 2024-10-30 11:13:53.614273, elapsed 0:00:01.935710
Archiving Python UDF dependencies from /opt/spark/work-dir/tmp/udfpydeps-pack-zvnufi85/packages to zip archive /opt/spark/work-dir/tmp/udfpydeps-pack-zvnufi85/archive: start 2024-10-30 11:13:53.614487
Archiving Python UDF dependencies from /opt/spark/work-dir/tmp/udfpydeps-pack-zvnufi85/packages to zip archive /opt/spark/work-dir/tmp/udfpydeps-pack-zvnufi85/archive: end 2024-10-30 11:13:53.617926, elapsed 0:00:00.003439
Copying /opt/spark/work-dir/tmp/udfpydeps-pack-zvnufi85/archive.zip (17183 bytes) to /batch_jobs/j-241030b14656437fa80e80da41ff9600/udf-py-deps.zip
Note how pip install
is done to temp dir /opt/spark/work-dir/tmp/udfpydeps-pack-zvnufi85/packages
, which is first zipped (in 0.0034s) to other temp location /opt/spark/work-dir/tmp/udfpydeps-pack-zvnufi85/archive
and then copied to /opt/spark/work-dir/tmp/udfpydeps-pack-zvnufi85/archive.zip
relevant logs from executor:
run_udf with install_mode='zip' udf_python_dependencies_archive_path='/batch_jobs/j-241030b14656437fa80e80da41ff9600/udf-py-deps.zip'
Extracting Python UDF dependencies from /batch_jobs/j-241030b14656437fa80e80da41ff9600/udf-py-deps.zip to /tmp/udfpypeps-unpack-lrbgture: start 2024-10-30 11:14:45.746446
Extracting Python UDF dependencies from /batch_jobs/j-241030b14656437fa80e80da41ff9600/udf-py-deps.zip to /tmp/udfpypeps-unpack-lrbgture: end 2024-10-30 11:14:45.760857, elapsed 0:00:00.014411
Cleaning up temporary UDF deps at /tmp/udfpypeps-unpack-lrbgture
Note how dependencies are unzipped on the fly from /batch_jobs/j-241030b14656437fa80e80da41ff9600/udf-py-deps.zip
to a temp folder tmp/udfpypeps-unpack-lrbgture
(in 0.014s)
The temp folder is automatically cleaned up after running the UDF
One concern are these warnings in executor logs:
WARNING Empty/non-existent UDF_PYTHON_DEPENDENCIES_ARCHIVE_PATH /batch_jobs/j-241030b14656437fa80e80da41ff9600/udf-py-deps.zip
Which is probably related to the S3-FUSE-mount delay issue we are seeing in other places.
After some time the file seems to exist and the UDF works. I guess I should put back the sleep config I removed with os_creodias_openeo_k8s/commits/657d38743ecacf73efdc40a5519cf3faa97d615c
Pushed some more tweaks to various pipelines.
when these become active I can give this a final test
Still struggling with the ZIP archive availability in the executors. Archive is created succesfully in driver, e.g. at /batch_jobs/j-24103037146945588a44e866e66c11bd/udf-py-deps.zip
, but 30 seconds later that is still not available on executors:
bumped sleeping to 30s in os_creodias_openeo_k8s/commits/2f76a7dc4c0a67a1d717c61aaee8ef7cd9c75c6 which seems to make the integrations tests pass apparently
I considered to make the sleeping smarter (exponential backoff or something alike), but maybe that's for another ticket:
did a final manual test against openeo.dev.warsaw.openeo.dataspace.copernicus.eu: j-241031f4d3254949b3f7704cd386eaab
here are relevant logs from kibana:
verifies that it works, so time to close this ticket
A more permanent solution for https://github.com/eu-cdse/openeo-cdse-infra/issues/112