Open mik3lol opened 5 months ago
Hey, there is an issue with mlflow in DLT that we're working on fixing, you should be able to make it work installing this at the beginning of the DLT notebook:
%pip install git+https://github.com/WeichenXu123/mlflow.git@dlt-temp-fix
Thanks @QuentinAmbard I'll give it a try and report back.
Added the %pip install
line on top of "01.1-DLT-Wind-Turbine-SQL" but still got the same error. Will continue to check.
@mik3lol could you try changing the DLT channel to CURRENT in the DLT setup and see if it helps?
@mik3lol
Could you paste your error message after installing %pip install git+https://github.com/WeichenXu123/mlflow.git@dlt-temp-fix
?
I need to check the full error message string "OSError: No such file or directory: {directory path}"
because @dlt-temp-fix branch uses another directory path, I need to check if it really took effect.
and could you share me your DLT pipeline link to me ?
@QuentinAmbard just verified that both the C360 and IoT demo DLT tables are working when using the CURRENT channel
👋 @QuentinAmbard, confirming default dbdemo FSI Smart Claims installations ran successfully. Trying others now.
Both demos failed with
RUN_EXECUTION_ERROR: Workload failed
in one of the DLT pipeline steps, due toOSError: No such file or directory: '/local_disk0/.ephemeral_nfs/repl_tmp_data/ReplId-768fe-95041-c4868-1/mlflow/models/tmp_3o1_40h/.'
Full stack trace below:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 871.0 failed 4 times, most recent failure: Lost task 0.3 in stage 871.0 (TID 950) (10.0.36.202 executor 0): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-cbf62aec-d47b-48f3-a264-73a49ce0c0cf/lib/python3.10/site-packages/mlflow/pyfunc/init.py", line 1275, in udf loaded_model = mlflow.pyfunc.load_model(local_model_path) File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-cbf62aec-d47b-48f3-a264-73a49ce0c0cf/lib/python3.10/site-packages/mlflow/pyfunc/init.py", line 578, in load_model local_path = _download_artifact_from_uri(artifact_uri=model_uri, output_path=dst_path) File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-cbf62aec-d47b-48f3-a264-73a49ce0c0cf/lib/python3.10/site-packages/mlflow/tracking/artifact_utils.py", line 100, in _download_artifact_from_uri return get_artifact_repository(artifact_uri=root_uri).download_artifacts( File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-cbf62aec-d47b-48f3-a264-73a49ce0c0cf/lib/python3.10/site-packages/mlflow/store/artifact/local_artifact_repo.py", line 81, in download_artifacts raise OSError(f"No such file or directory: '{local_artifact_path}'") OSError: No such file or directory: '/local_disk0/.ephemeral_nfs/repl_tmp_data/ReplId-768fe-95041-c4868-1/mlflow/models/tmp_3o1_40h/.'
Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:3897) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:3819) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:3806) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:3806) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1685) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1670) at scala.Option.foreach(Option.scala:407) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1670) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:4143) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:4055) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:4043) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:54) org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-cbf62aec-d47b-48f3-a264-73a49ce0c0cf/lib/python3.10/site-packages/mlflow/pyfunc/init.py", line 1275, in udf loaded_model = mlflow.pyfunc.load_model(local_model_path) File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-cbf62aec-d47b-48f3-a264-73a49ce0c0cf/lib/python3.10/site-packages/mlflow/pyfunc/init.py", line 578, in load_model local_path = _download_artifact_from_uri(artifact_uri=model_uri, output_path=dst_path) File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-cbf62aec-d47b-48f3-a264-73a49ce0c0cf/lib/python3.10/site-packages/mlflow/tracking/artifact_utils.py", line 100, in _download_artifact_from_uri return get_artifact_repository(artifact_uri=root_uri).download_artifacts( File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-cbf62aec-d47b-48f3-a264-73a49ce0c0cf/lib/python3.10/site-packages/mlflow/store/artifact/local_artifact_repo.py", line 81, in download_artifacts raise OSError(f"No such file or directory: '{local_artifact_path}'") OSError: No such file or directory: '/local_disk0/.ephemeral_nfs/repl_tmp_data/ReplId-768fe-95041-c4868-1/mlflow/models/tmp_3o1_40h/.'
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:550) at org.apache.spark.sql.execution.python.PythonArrowOutput$$anon$1.read(PythonArrowOutput.scala:117) at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:506) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.processNext(null:-1) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:195) at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:56) at org.apache.spark.scheduler.ShuffleMapTask.$anonfun$runTask$3(ShuffleMapTask.scala:92) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.scheduler.ShuffleMapTask.$anonfun$runTask$1(ShuffleMapTask.scala:87) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:58) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:39) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:201) at org.apache.spark.scheduler.Task.doRunTask(Task.scala:186) at org.apache.spark.scheduler.Task.$anonfun$run$5(Task.scala:151) at com.databricks.unity.UCSEphemeralState$Handle.runWith(UCSEphemeralState.scala:45) at com.databricks.unity.HandleImpl.runWith(UCSHandle.scala:103) at com.databricks.unity.HandleImpl.$anonfun$runWithAndClose$1(UCSHandle.scala:108) at scala.util.Using$.resource(Using.scala:269) at com.databricks.unity.HandleImpl.runWithAndClose(UCSHandle.scala:107) at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:145) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$9(Executor.scala:958) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:105) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:961) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:853) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750)