Open soxofaan opened 1 week ago
cc @EmileSonneveld
As an illustration that this does not scale:
if UDF_PYTHON_DEPENDENCIES_FOLDER_NAME in str(file_path):
doesn't even work as UDF_PYTHON_DEPENDENCIES_FOLDER_NAME
is not in play anymore on CDSE since #845
In export_workspace list of files that exists locally an on s3 is determined by the list of stac metadata files.
For example colection.json
+ item.tiff.json
. Probably the same can be used here
This issue is a direct result of changes introduced in https://github.com/Open-EO/openeo-geopyspark-driver/issues/877
Logged what files do get uploaded. All logs on cdse dev where one of the following 2:
Writing results to object storage. paths=[PosixPath('/batch_jobs/j-XXX/job_specification.json'), PosixPath('/batch_jobs/j-XXX/job_metadata.json'), PosixPath('/batch_jobs/j-XXX/openEO_2017-03-07Z.tif'), PosixPath('/batch_jobs/j-XXX/openEO_2017-03-07Z.tif.aux.xml'), PosixPath('/batch_jobs/j-XXX/openEO_2017-03-07Z.tif.json'), PosixPath('/batch_jobs/j-XXX/collection.json')]
Writing results to object storage. paths=[PosixPath('/batch_jobs/j-XXX/job_specification.json'), PosixPath('/batch_jobs/j-XXX/job_metadata.json')]
https://github.com/Open-EO/openeo-geopyspark-driver/blob/b63280c5fc5b928a0d231ab1aec6e3b47b4b9c36/openeogeotrellis/deploy/batch_job.py#L511-L519
Here we're building an ugly ad-hoc deny-list for "files" that should not be uploaded to S3
As mentioned in the TODO, we should use an explicit asset list to upload instead of blindly assuming everything from the job dir should be uploaded (minus some hand-picked exceptions)