aodn / aodn_cloud_optimised

Cloud optimised data formats
GNU General Public License v3.0
2 stars 1 forks source link

Parquet Creation and Update failed in Prefect by Client not associated with a value #106

Open LeoLee-Xiaohu opened 4 days ago

LeoLee-Xiaohu commented 4 days ago

We created a Prefect flow that used the cloud optimised repo to update parquet datasets. However, there is an error of UnboundLocalError: cannot access local variable 'client' where it is not associated with a value broke the prefect flow.

Here is the error log in prefect:

Encountered exception during execution:
Traceback (most recent call last):
  File "/home/xiaohul/miniconda3/envs/cloud-optimised/lib/python3.11/site-packages/prefect/engine.py", line 894, in orchestrate_flow_run
    result = await flow_call.aresult()
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/xiaohul/miniconda3/envs/cloud-optimised/lib/python3.11/site-packages/prefect/_internal/concurrency/calls.py", line 327, in aresult
    return await asyncio.wrap_future(self.future)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/xiaohul/miniconda3/envs/cloud-optimised/lib/python3.11/site-packages/prefect/_internal/concurrency/calls.py", line 352, in _run_sync
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/xiaohul/AODN/dataflow-orchestration/projects/cloud_optimised/cloud_optimised_update_flow.py", line 40, in update_optimised
    cloud_optimised_creation(
  File "/home/xiaohul/miniconda3/envs/cloud-optimised/lib/python3.11/site-packages/aodn_cloud_optimised/lib/CommonHandler.py", line 617, in cloud_optimised_creation
    handler_instance.to_cloud_optimised(s3_file_uri_list)
  File "/home/xiaohul/miniconda3/envs/cloud-optimised/lib/python3.11/site-packages/aodn_cloud_optimised/lib/GenericParquetHandler.py", line 1097, in to_cloud_optimised
    client, cluster = self.create_cluster()
                      ^^^^^^^^^^^^^^^^^^^^^
  File "/home/xiaohul/miniconda3/envs/cloud-optimised/lib/python3.11/site-packages/aodn_cloud_optimised/lib/CommonHandler.py", line 248, in create_cluster
    client.forward_logging()
    ^^^^^^
UnboundLocalError: cannot access local variable 'client' where it is not associated with a value

It seems like that the ClusterMode of None is not supported for parquet. It could be very helpful if @lbesnard could have a look on this.

lbesnard commented 2 days ago

@LeoLee-Xiaohu this should be fixed now with latest version. see for example the unittest now working without a cluster value https://github.com/aodn/aodn_cloud_optimised/blob/main/test_aodn_cloud_optimised/test_generic_parquet_handler.py#L211

work is done here: https://github.com/aodn/aodn_cloud_optimised/blob/main/aodn_cloud_optimised/lib/GenericParquetHandler.py#L1134