Open-EO / openeo-gfmap

Generic framework for EO mapping applications building on openEO
Apache License 2.0
4 stars 0 forks source link

Directly download results to output path instead of downloading to tmp first. #60

Closed VictorVerhaert closed 3 months ago

VictorVerhaert commented 3 months ago

motivation: I am getting concurrency errors on moving the results from the tmp folder to the output dir.

Is there a reason we need to download to a tmp folder first?

I have tried skipping this step and faced no issue. code snippet of on_job_done:

def on_job_done(self, job: BatchJob, row: pd.Series):
        """Method called when a job finishes successfully. It will first download the results of
        the job and then call the `post_job_action` method.
        """
        job_products = {}
        for idx, asset in enumerate(job.get_results().get_assets()):
            temp_file = NamedTemporaryFile(delete=False)
            try:
                _log.debug(
                    f"Generating output path for asset {asset.name} from job {job.job_id}..."
                )
                output_path = self._output_path_gen(self._output_dir, temp_file.name, idx, row)
                # Make the output path
                output_path.parent.mkdir(parents=True, exist_ok=True)
                _log.debug(
                    f"Downloading asset {asset.name} from job {job.job_id} -> {output_path}"
                )
                asset.download(output_path)

                # _log.debug(
                #     f"Generated path for asset {asset.name} from job {job.job_id} -> {output_path}"
                # )

                # Move the temporary file to the final location
                # shutil.move(temp_file.name, output_path)
                # Add to the list of downloaded products
                job_products[f"{job.job_id}_{asset.name}"] = [output_path]
                _log.info(f"Downloaded asset {asset.name} from job {job.job_id} -> {output_path}")
            except Exception as e:
                _log.exception(f"Error downloading asset {asset.name} from job {job.job_id}", e)
                raise e
            finally:
                shutil.rmtree(temp_file.name, ignore_errors=True)
    ...
GriffinBabe commented 3 months ago

I'll take care of this one as there are some changes in the extraction pipeline to do