feathr-ai / feathr

Feathr – A scalable, unified data and AI engineering platform for enterprise
https://join.slack.com/t/feathrai/shared_invite/zt-1ffva5u6v-voq0Us7bbKAw873cEzHOSg
Apache License 2.0
1.99k stars 260 forks source link

NYC Taxi Demo failing at materialize_features with RuntimeError: Spark job failed. #419

Closed blrchen closed 2 years ago

blrchen commented 2 years ago

Firstly in config yaml remove feathr_runtime_location , this indicates Feathr to use runtime jar from marven source. Then run notebook and it would fail at materialize_features with RuntimeError: Spark job failed.

This error does not occur if feathr_runtime_location is changed to https://azurefeathrstorage.blob.core.windows.net/public/feathr-assembly-LATEST.jar. Seems spefic to runtime jar file in marven.

Logs:

2022-06-30 19:00:31.247 | INFO     | feathr.spark_provider._databricks_submission:submit_feathr_job:155 - Main JAR file is not set, using default package 'com.linkedin.feathr:feathr_2.12:0.4.0' from Maven
2022-06-30 19:00:31.248 | INFO     | feathr.spark_provider._databricks_submission:upload_or_get_cloud_path:87 - Skip uploading file dbfs:/feathr_getting_started/feathr_pyspark_driver.py as the file starts with dbfs:/
2022-06-30 19:00:31.745 | INFO     | feathr.spark_provider._databricks_submission:submit_feathr_job:181 - Feathr job Submitted Successfully. View more details here: https://adb-1948202983662686.6.azuredatabricks.net/?o=1948202983662686#job/747225329813358/run/112140
2022-06-30 19:00:31.834 | DEBUG    | feathr.spark_provider._databricks_submission:wait_for_completion:192 - Current Spark job status: PENDING
2022-06-30 19:01:01.960 | DEBUG    | feathr.spark_provider._databricks_submission:wait_for_completion:192 - Current Spark job status: PENDING
2022-06-30 19:01:32.167 | DEBUG    | feathr.spark_provider._databricks_submission:wait_for_completion:192 - Current Spark job status: PENDING
2022-06-30 19:02:02.296 | DEBUG    | feathr.spark_provider._databricks_submission:wait_for_completion:192 - Current Spark job status: PENDING
2022-06-30 19:02:32.436 | DEBUG    | feathr.spark_provider._databricks_submission:wait_for_completion:192 - Current Spark job status: RUNNING
2022-06-30 19:03:02.571 | DEBUG    | feathr.spark_provider._databricks_submission:wait_for_completion:192 - Current Spark job status: RUNNING
2022-06-30 19:03:32.701 | DEBUG    | feathr.spark_provider._databricks_submission:wait_for_completion:192 - Current Spark job status: RUNNING
2022-06-30 19:04:02.882 | DEBUG    | feathr.spark_provider._databricks_submission:wait_for_completion:192 - Current Spark job status: FAILED
2022-06-30 19:04:03.252 | ERROR    | feathr.spark_provider._databricks_submission:wait_for_completion:202 - Feathr job has failed. Please visit this page to view error message: https://adb-1948202983662686.6.azuredatabricks.net/?o=1948202983662686#job/747225329813358/run/112140
2022-06-30 19:04:03.252 | ERROR    | feathr.spark_provider._databricks_submission:wait_for_completion:204 - Error Code: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 12.0 failed 4 times, most recent failure: Lost task 0.3 in stage 12.0 (TID 19) (10.139.64.4 executor 0): java.lang.IllegalAccessError: tried to access field com.google.protobuf.AbstractMessage.memoizedSize from class com.linkedin.feathr.common.types.protobuf.FeatureValueOuterClass$FeatureValue
2022-06-30 19:04:03.252 | ERROR    | feathr.spark_provider._databricks_submission:wait_for_completion:206 - ---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
    104 
    105 print("pyspark_client.py: Preprocessing via UDFs and submit Spark job.")
--> 106 submit_spark_job(feature_names_funcs)
    107 print("pyspark_client.py: Feathr Pyspark job completed.")
    108 

/tmp/tmp0gjdxdw4.py in submit_spark_job(feature_names_funcs)
     83     print(new_preprocessed_df_map)
     84 
---> 85     py4j_feature_job.mainWithPreprocessedDataFrame(job_param_java_array, new_preprocessed_df_map)
     86     return None
     87 

/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py in __call__(self, *args)
   1302 
   1303         answer = self.gateway_client.send_command(command)
-> 1304         return_value = get_return_value(
   1305             answer, self.gateway_client, self.target_id, self.name)
   1306 

/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
    115     def deco(*a, **kw):
    116         try:
--> 117             return f(*a, **kw)
    118         except py4j.protocol.Py4JJavaError as e:
    119             converted = convert_exception(e.java_exception)

/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
    324             value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
    325             if answer[1] == REFERENCE_TYPE:
--> 326                 raise Py4JJavaError(
    327                     "An error occurred while calling {0}{1}{2}.\n".
    328                     format(target_id, ".", name), value)

Py4JJavaError: An error occurred while calling z:com.linkedin.feathr.offline.job.FeatureGenJob.mainWithPreprocessedDataFrame.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 12.0 failed 4 times, most recent failure: Lost task 0.3 in stage 12.0 (TID 19) (10.139.64.4 executor 0): java.lang.IllegalAccessError: tried to access field com.google.protobuf.AbstractMessage.memoizedSize from class com.linkedin.feathr.common.types.protobuf.FeatureValueOuterClass$FeatureValue
    at com.linkedin.feathr.common.types.protobuf.FeatureValueOuterClass$FeatureValue.getSerializedSize(FeatureValueOuterClass.java:1296)
    at com.google.protobuf.AbstractMessageLite.toByteArray(AbstractMessageLite.java:64)
    at com.linkedin.feathr.offline.generation.outputProcessor.RedisOutputUtils$.$anonfun$getConversionFunction$2(RedisOutputUtils.scala:100)
    at com.linkedin.feathr.offline.generation.outputProcessor.RedisOutputUtils$.$anonfun$encodeDataFrame$2(RedisOutputUtils.scala:52)
    at com.linkedin.feathr.offline.generation.outputProcessor.RedisOutputUtils$.$anonfun$encodeDataFrame$2$adapted(RedisOutputUtils.scala:49)
    at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
    at scala.collection.immutable.Range.foreach(Range.scala:158)
    at scala.collection.TraversableLike.map(TraversableLike.scala:238)
    at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
    at scala.collection.AbstractTraversable.map(Traversable.scala:108)
    at com.linkedin.feathr.offline.generation.outputProcessor.RedisOutputUtils$.$anonfun$encodeDataFrame$1(RedisOutputUtils.scala:49)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.mapelements_doConsume_0$(Unknown Source)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.deserializetoobject_doConsume_0$(Unknown Source)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
    at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
    at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:757)
    at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
    at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
    at scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:1209)
    at scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:1215)
    at scala.collection.Iterator.foreach(Iterator.scala:941)
    at scala.collection.Iterator.foreach$(Iterator.scala:941)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
    at org.apache.spark.sql.redis.RedisSourceRelation.$anonfun$insert$5(RedisSourceRelation.scala:125)
    at org.apache.spark.sql.redis.RedisSourceRelation.$anonfun$insert$5$adapted(RedisSourceRelation.scala:123)
    at org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2(RDD.scala:1025)
    at org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2$adapted(RDD.scala:1025)
    at org.apache.spark.SparkContext.$anonfun$runJob$2(SparkContext.scala:2517)
    at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$3(ResultTask.scala:75)
    at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
    at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$1(ResultTask.scala:75)
    at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:55)
    at org.apache.spark.scheduler.Task.doRunTask(Task.scala:150)
    at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:119)
    at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
    at org.apache.spark.scheduler.Task.run(Task.scala:91)
    at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$13(Executor.scala:813)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1657)
    at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:816)
    at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
    at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:672)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

Driver stacktrace:
    at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2828)
    at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2775)
    at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2769)
    at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
    at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
    at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2769)
    at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1305)
    at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1305)
    at scala.Option.foreach(Option.scala:407)
    at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1305)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:3036)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2977)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2965)
    at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
    at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:1067)
    at org.apache.spark.SparkContext.runJobInternal(SparkContext.scala:2477)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:2460)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:2498)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:2517)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:2542)
    at org.apache.spark.rdd.RDD.$anonfun$foreachPartition$1(RDD.scala:1025)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:165)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:125)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:419)
    at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:1023)
    at org.apache.spark.sql.Dataset.$anonfun$foreachPartition$1(Dataset.scala:2949)
    at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
    at org.apache.spark.sql.Dataset.$anonfun$withNewRDDExecutionId$1(Dataset.scala:3814)
    at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$5(SQLExecution.scala:130)
    at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:273)
    at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$1(SQLExecution.scala:104)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:854)
    at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:77)
    at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:223)
    at org.apache.spark.sql.Dataset.withNewRDDExecutionId(Dataset.scala:3812)
    at org.apache.spark.sql.Dataset.foreachPartition(Dataset.scala:2949)
    at org.apache.spark.sql.redis.RedisSourceRelation.insert(RedisSourceRelation.scala:123)
    at org.apache.spark.sql.redis.DefaultSource.createRelation(DefaultSource.scala:24)
    at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:48)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:96)
    at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:213)
    at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:257)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:165)
    at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:253)
    at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:209)
    at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:167)
    at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:166)
    at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:1080)
    at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$5(SQLExecution.scala:130)
    at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:273)
    at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$1(SQLExecution.scala:104)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:854)
    at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:77)
    at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:223)
    at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:1080)
    at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:469)
    at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:439)
    at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:312)
    at com.linkedin.feathr.offline.generation.outputProcessor.RedisOutputUtils$.writeToRedis(RedisOutputUtils.scala:38)
    at com.linkedin.feathr.offline.generation.outputProcessor.PushToRedisOutputProcessor.processSingle(PushToRedisOutputProcessor.scala:29)
    at com.linkedin.feathr.offline.generation.outputProcessor.WriteToHDFSOutputProcessor.$anonfun$processAllHelper$8(WriteToHDFSOutputProcessor.scala:107)
    at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
    at scala.collection.immutable.Map$Map1.foreach(Map.scala:128)
    at scala.collection.TraversableLike.map(TraversableLike.scala:238)
    at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
    at scala.collection.AbstractTraversable.map(Traversable.scala:108)
    at com.linkedin.feathr.offline.generation.outputProcessor.WriteToHDFSOutputProcessor.processAllHelper(WriteToHDFSOutputProcessor.scala:101)
    at com.linkedin.feathr.offline.generation.outputProcessor.WriteToHDFSOutputProcessor.processAll(WriteToHDFSOutputProcessor.scala:46)
    at com.linkedin.feathr.offline.generation.DataFrameFeatureGenerator.$anonfun$generateFeaturesAsDF$12(DataFrameFeatureGenerator.scala:108)
    at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
    at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
    at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
    at scala.collection.TraversableLike.map(TraversableLike.scala:238)
    at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
    at scala.collection.AbstractTraversable.map(Traversable.scala:108)
    at com.linkedin.feathr.offline.generation.DataFrameFeatureGenerator.generateFeaturesAsDF(DataFrameFeatureGenerator.scala:108)
    at com.linkedin.feathr.offline.client.FeathrClient.generateFeatures(FeathrClient.scala:96)
    at com.linkedin.feathr.offline.job.FeatureGenJob$.run(FeatureGenJob.scala:137)
    at com.linkedin.feathr.offline.job.FeatureGenJob$.run(FeatureGenJob.scala:94)
    at com.linkedin.feathr.offline.job.FeatureGenJob$.process(FeatureGenJob.scala:254)
    at com.linkedin.feathr.offline.job.FeatureGenJob$.main(FeatureGenJob.scala:265)
    at com.linkedin.feathr.offline.job.FeatureGenJob$.mainWithPreprocessedDataFrame(FeatureGenJob.scala:261)
    at com.linkedin.feathr.offline.job.FeatureGenJob.mainWithPreprocessedDataFrame(FeatureGenJob.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
    at py4j.Gateway.invoke(Gateway.java:295)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:251)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalAccessError: tried to access field com.google.protobuf.AbstractMessage.memoizedSize from class com.linkedin.feathr.common.types.protobuf.FeatureValueOuterClass$FeatureValue
    at com.linkedin.feathr.common.types.protobuf.FeatureValueOuterClass$FeatureValue.getSerializedSize(FeatureValueOuterClass.java:1296)
    at com.google.protobuf.AbstractMessageLite.toByteArray(AbstractMessageLite.java:64)
    at com.linkedin.feathr.offline.generation.outputProcessor.RedisOutputUtils$.$anonfun$getConversionFunction$2(RedisOutputUtils.scala:100)
    at com.linkedin.feathr.offline.generation.outputProcessor.RedisOutputUtils$.$anonfun$encodeDataFrame$2(RedisOutputUtils.scala:52)
    at com.linkedin.feathr.offline.generation.outputProcessor.RedisOutputUtils$.$anonfun$encodeDataFrame$2$adapted(RedisOutputUtils.scala:49)
    at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
    at scala.collection.immutable.Range.foreach(Range.scala:158)
    at scala.collection.TraversableLike.map(TraversableLike.scala:238)
    at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
    at scala.collection.AbstractTraversable.map(Traversable.scala:108)
    at com.linkedin.feathr.offline.generation.outputProcessor.RedisOutputUtils$.$anonfun$encodeDataFrame$1(RedisOutputUtils.scala:49)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.mapelements_doConsume_0$(Unknown Source)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.deserializetoobject_doConsume_0$(Unknown Source)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
    at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
    at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:757)
    at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
    at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
    at scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:1209)
    at scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:1215)
    at scala.collection.Iterator.foreach(Iterator.scala:941)
    at scala.collection.Iterator.foreach$(Iterator.scala:941)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
    at org.apache.spark.sql.redis.RedisSourceRelation.$anonfun$insert$5(RedisSourceRelation.scala:125)
    at org.apache.spark.sql.redis.RedisSourceRelation.$anonfun$insert$5$adapted(RedisSourceRelation.scala:123)
    at org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2(RDD.scala:1025)
    at org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2$adapted(RDD.scala:1025)
    at org.apache.spark.SparkContext.$anonfun$runJob$2(SparkContext.scala:2517)
    at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$3(ResultTask.scala:75)
    at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
    at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$1(ResultTask.scala:75)
    at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:55)
    at org.apache.spark.scheduler.Task.doRunTask(Task.scala:150)
    at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:119)
    at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
    at org.apache.spark.scheduler.Task.run(Task.scala:91)
    at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$13(Executor.scala:813)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1657)
    at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:816)
    at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
    at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:672)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    ... 1 more
Traceback (most recent call last):
  File "/home/blairch/workspace/nyc_driver_databricks.py", line 235, in <module>

  File "/mnt/d/github/feathr/feathr_project/feathr/client.py", line 636, in wait_job_to_finish
    raise RuntimeError('Spark job failed.')
RuntimeError: Spark job failed.
blrchen commented 2 years ago

Will verify once v0.6 jar is released to maven

blrchen commented 2 years ago

Close as dup of #616