Closed jstammers closed 1 year ago
Ah, this is really interesting, I don't even know SparkConf can be access controlled. Maybe we can ignore the error if SparkConf doesn't work. I will release a dev version if you want to try.
Thanks @goodwanghan - I wasn't aware either, until I encountered this error. I'd be happy to try a dev version once you've made the necessary changes
@jstammers please try 0.8.7.dev5, it may have solved this issue
That seems to work for me. Thanks for solving this so quickly!
Minimal Code To Reproduce
Describe the bug I am trying to execute some code on a shared databricks cluster and have encountered the following error
Traceback
```bash File /local_disk0/.ephemeral_nfs/envs/pythonEnv-db41b783-be44-4fb2-9cf0-f9266ee90aee/lib/python3.10/site-packages/fugue/execution/api.py:1219, in aggregate(df, partition_by, engine, engine_conf, as_fugue, as_local, **agg_kwcols) 1184 """Aggregate on dataframe 1185 1186 :param df: the dataframe to aggregate on (...) 1213 fa.aggregate(df, "a", x=f.max(col("b"))) 1214 """ 1215 cols = [ 1216 v.alias(k) if isinstance(v, ColumnExpr) else lit(v).alias(k) 1217 for k, v in agg_kwcols.items() 1218 ] -> 1219 return run_engine_function( 1220 lambda e: e.aggregate( 1221 as_fugue_df(df), 1222 partition_spec=None 1223 if partition_by is None 1224 else PartitionSpec(by=partition_by), 1225 agg_cols=cols, 1226 ), 1227 engine=engine, 1228 engine_conf=engine_conf, 1229 infer_by=[df], 1230 as_fugue=as_fugue, 1231 as_local=as_local, 1232 ) File /local_disk0/.ephemeral_nfs/envs/pythonEnv-db41b783-be44-4fb2-9cf0-f9266ee90aee/lib/python3.10/site-packages/fugue/execution/api.py:171, in run_engine_function(func, engine, engine_conf, as_fugue, as_local, infer_by) 145 def run_engine_function( 146 func: Callable[[ExecutionEngine], Any], 147 engine: AnyExecutionEngine = None, (...) 151 infer_by: Optional[List[Any]] = None, 152 ) -> Any: 153 """Run a lambda function based on the engine provided 154 155 :param engine: an engine like object, defaults to None (...) 169 This function is for deveopment use. Users should not need it. 170 """ --> 171 with engine_context(engine, engine_conf=engine_conf, infer_by=infer_by) as e: 172 res = func(e) 174 if isinstance(res, DataFrame): File /usr/lib/python3.10/contextlib.py:281, in contextmanager.From looking into this a little further, I think it's related to the fact that [certain functions are disabled for security reasons on high concurrency clusters. Is there another way I can configure
fugue
to use the spark execution engine in this instance?Expected behavior I would expect this aggregation to be executed, as it would on other databricks clusters I have tried
Environment (please complete the following information):