motivation: I am getting concurrency errors on moving the results from the tmp folder to the output dir.
Is there a reason we need to download to a tmp folder first?
I have tried skipping this step and faced no issue.
code snippet of on_job_done:
def on_job_done(self, job: BatchJob, row: pd.Series):
"""Method called when a job finishes successfully. It will first download the results of
the job and then call the `post_job_action` method.
"""
job_products = {}
for idx, asset in enumerate(job.get_results().get_assets()):
temp_file = NamedTemporaryFile(delete=False)
try:
_log.debug(
f"Generating output path for asset {asset.name} from job {job.job_id}..."
)
output_path = self._output_path_gen(self._output_dir, temp_file.name, idx, row)
# Make the output path
output_path.parent.mkdir(parents=True, exist_ok=True)
_log.debug(
f"Downloading asset {asset.name} from job {job.job_id} -> {output_path}"
)
asset.download(output_path)
# _log.debug(
# f"Generated path for asset {asset.name} from job {job.job_id} -> {output_path}"
# )
# Move the temporary file to the final location
# shutil.move(temp_file.name, output_path)
# Add to the list of downloaded products
job_products[f"{job.job_id}_{asset.name}"] = [output_path]
_log.info(f"Downloaded asset {asset.name} from job {job.job_id} -> {output_path}")
except Exception as e:
_log.exception(f"Error downloading asset {asset.name} from job {job.job_id}", e)
raise e
finally:
shutil.rmtree(temp_file.name, ignore_errors=True)
...
motivation: I am getting concurrency errors on moving the results from the tmp folder to the output dir.
Is there a reason we need to download to a tmp folder first?
I have tried skipping this step and faced no issue. code snippet of on_job_done: