Azure / feast-azure

Azure plugins for Feast (FEAture STore)
MIT License
81 stars 52 forks source link

k8s launcher not supported (Python Version compatibility 3.7/3.8) #42

Open andrijaperovic opened 2 years ago

andrijaperovic commented 2 years ago

Running historical retrieval with os.environ["FEAST_SPARK_LAUNCHER"] = "k8s" breaks due to python compatibility with generated spark python file:

kubectl logs feast-m5mz27p6-driver -n spark-operator                                                                            main@d2a800f ✘
++ id -u
+ myuid=0
++ id -g
+ mygid=0
+ set +e
++ getent passwd 0
+ uidentry=root:x:0:0:root:/root:/bin/bash
+ set -e
+ '[' -z root:x:0:0:root:/root:/bin/bash ']'
+ SPARK_CLASSPATH=':/opt/spark/jars/*'
+ env
+ grep SPARK_JAVA_OPT_
+ sort -t_ -k4 -n
+ sed 's/[^=]*=\(.*\)/\1/g'
+ readarray -t SPARK_EXECUTOR_JAVA_OPTS
+ '[' -n '' ']'
+ '[' '' == 2 ']'
+ '[' '' == 3 ']'
+ '[' -n '' ']'
+ '[' -z ']'
+ case "$1" in
+ shift 1
+ CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@")
+ exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=10.244.1.33 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner wasbs://feasttest@feastdrivingpoc.blob.core.windows.net/artifacts/44d600e498692f5f4d9e183718f12fbd7925c1b97205373c181486ccab33f667.py --feature-tables W3siZmVhdHVyZXMiOiBbeyJuYW1lIjogImF2Z19kYWlseV90cmlwcyIsICJ0eXBlIjogIklOVDMyIn0sIHsibmFtZSI6ICJhY2NfcmF0ZSIsICJ0eXBlIjogIkZMT0FUIn0sIHsibmFtZSI6ICJjb252X3JhdGUiLCAidHlwZSI6ICJGTE9BVCJ9XSwgInByb2plY3QiOiAiZmVhc3Rkcml2aW5ncG9jdGVzdCIsICJuYW1lIjogImRyaXZlcl9zdGF0aXN0aWNzIiwgImVudGl0aWVzIjogW3sibmFtZSI6ICJkcml2ZXJfaWQiLCAidHlwZSI6ICJJTlQ2NCJ9XSwgIm1heF9hZ2UiOiA4NjQwMCwgImxhYmVscyI6IHt9fSwgeyJmZWF0dXJlcyI6IFt7Im5hbWUiOiAidHJpcHNfdG9kYXkiLCAidHlwZSI6ICJJTlQzMiJ9XSwgInByb2plY3QiOiAiZmVhc3Rkcml2aW5ncG9jdGVzdCIsICJuYW1lIjogImRyaXZlcl90cmlwcyIsICJlbnRpdGllcyI6IFt7Im5hbWUiOiAiZHJpdmVyX2lkIiwgInR5cGUiOiAiSU5UNjQifV0sICJtYXhfYWdlIjogODY0MDAsICJsYWJlbHMiOiB7fX1d --feature-tables-sources W3siZmlsZSI6IHsiZmllbGRfbWFwcGluZyI6IHt9LCAiZXZlbnRfdGltZXN0YW1wX2NvbHVtbiI6ICJkYXRldGltZSIsICJjcmVhdGVkX3RpbWVzdGFtcF9jb2x1bW4iOiAiY3JlYXRlZCIsICJkYXRlX3BhcnRpdGlvbl9jb2x1bW4iOiAiZGF0ZSIsICJwYXRoIjogIndhc2JzOi8vZmVhc3R0ZXN0QGZlYXN0ZHJpdmluZ3BvYy5ibG9iLmNvcmUud2luZG93cy5uZXQvZHJpdmVyX3N0YXRpc3RpY3MiLCAiZm9ybWF0IjogeyJqc29uX2NsYXNzIjogIlBhcnF1ZXRGb3JtYXQifX19LCB7ImZpbGUiOiB7ImZpZWxkX21hcHBpbmciOiB7fSwgImV2ZW50X3RpbWVzdGFtcF9jb2x1bW4iOiAiZGF0ZXRpbWUiLCAiY3JlYXRlZF90aW1lc3RhbXBfY29sdW1uIjogImNyZWF0ZWQiLCAiZGF0ZV9wYXJ0aXRpb25fY29sdW1uIjogImRhdGUiLCAicGF0aCI6ICJ3YXNiczovL2ZlYXN0dGVzdEBmZWFzdGRyaXZpbmdwb2MuYmxvYi5jb3JlLndpbmRvd3MubmV0L2RyaXZlcl90cmlwcyIsICJmb3JtYXQiOiB7Impzb25fY2xhc3MiOiAiUGFycXVldEZvcm1hdCJ9fX1d --entity-source eyJmaWxlIjogeyJmaWVsZF9tYXBwaW5nIjoge30sICJldmVudF90aW1lc3RhbXBfY29sdW1uIjogImV2ZW50X3RpbWVzdGFtcCIsICJjcmVhdGVkX3RpbWVzdGFtcF9jb2x1bW4iOiAiIiwgImRhdGVfcGFydGl0aW9uX2NvbHVtbiI6ICIiLCAicGF0aCI6ICJ3YXNiczovL2ZlYXN0dGVzdEBmZWFzdGRyaXZpbmdwb2MuYmxvYi5jb3JlLndpbmRvd3MubmV0L2FydGlmYWN0cy9hODYxMTg0My0yY2QxLTQ1MjktYTliNi1mMTEwNGExYTEzYjMiLCAiZm9ybWF0IjogeyJqc29uX2NsYXNzIjogIlBhcnF1ZXRGb3JtYXQifX19 --destination eyJmb3JtYXQiOiAicGFycXVldCIsICJwYXRoIjogImFiZnNzOi8vZmVhc3Rwb2NAZmVhc3Rwb2MuZGZzLmNvcmUud2luZG93cy5uZXQvZmVhc3Qvb3V0LzQ5ODQ1MzRhLWJiOGUtNDVmZi1hYzc3LTM2YzE3ZjA2YjRhOCJ9
22/01/23 22:43:01 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/01/23 22:43:02 WARN MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-azure-file-system.properties,hadoop-metrics2.properties
  File "/tmp/spark-ac4d3d12-c430-4f11-987a-7a193b8fef54/44d600e498692f5f4d9e183718f12fbd7925c1b97205373c181486ccab33f667.py", line 73
    event_timestamp_column: str,
                          ^
SyntaxError: invalid syntax
log4j:WARN No appenders could be found for logger (org.apache.spark.util.ShutdownHookManager).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

Upgrading to python3.8 from source in gcr.io/kf-feast/spark-py:v3.0.1 doesn't appear to fix the issue either. Using historical features example: https://github.com/Azure/feast-azure/blob/main/cluster/samples/feature_store_azure.ipynb

andrijaperovic commented 2 years ago

Tried modifying the image in SparkApplication job template of /usr/local/lib/python3.8/site-packages/feast_spark/pyspark/launchers/k8s/k8s_utils.py to use gcr.io/kf-feast/feast-spark:latest, however getting the ClassNotFound exception for NativeAzureFileSystem in the spark driver logs (assuming this need to be set on the classpath for spark):

+ CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@")
+ exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=10.244.1.45 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner wasbs://feasttest@feastdrivingpoc.blob.core.windows.net/artifacts/6235b5f4f1c32536794b4808ca5fbbcc004826f33ef1b2ec69b4a93ddf6fbd07.py --feature-tables W3siZmVhdHVyZXMiOiBbeyJuYW1lIjogImF2Z19kYWlseV90cmlwcyIsICJ0eXBlIjogIklOVDMyIn0sIHsibmFtZSI6ICJhY2NfcmF0ZSIsICJ0eXBlIjogIkZMT0FUIn0sIHsibmFtZSI6ICJjb252X3JhdGUiLCAidHlwZSI6ICJGTE9BVCJ9XSwgInByb2plY3QiOiAiZmVhc3Rkcml2aW5ncG9jdGVzdCIsICJuYW1lIjogImRyaXZlcl9zdGF0aXN0aWNzIiwgImVudGl0aWVzIjogW3sibmFtZSI6ICJkcml2ZXJfaWQiLCAidHlwZSI6ICJJTlQ2NCJ9XSwgIm1heF9hZ2UiOiA4NjQwMCwgImxhYmVscyI6IHt9fSwgeyJmZWF0dXJlcyI6IFt7Im5hbWUiOiAidHJpcHNfdG9kYXkiLCAidHlwZSI6ICJJTlQzMiJ9XSwgInByb2plY3QiOiAiZmVhc3Rkcml2aW5ncG9jdGVzdCIsICJuYW1lIjogImRyaXZlcl90cmlwcyIsICJlbnRpdGllcyI6IFt7Im5hbWUiOiAiZHJpdmVyX2lkIiwgInR5cGUiOiAiSU5UNjQifV0sICJtYXhfYWdlIjogODY0MDAsICJsYWJlbHMiOiB7fX1d --feature-tables-sources W3siZmlsZSI6IHsiZmllbGRfbWFwcGluZyI6IHt9LCAiZXZlbnRfdGltZXN0YW1wX2NvbHVtbiI6ICJkYXRldGltZSIsICJjcmVhdGVkX3RpbWVzdGFtcF9jb2x1bW4iOiAiY3JlYXRlZCIsICJkYXRlX3BhcnRpdGlvbl9jb2x1bW4iOiAiZGF0ZSIsICJwYXRoIjogIndhc2JzOi8vZmVhc3R0ZXN0QGZlYXN0ZHJpdmluZ3BvYy5ibG9iLmNvcmUud2luZG93cy5uZXQvZHJpdmVyX3N0YXRpc3RpY3MiLCAiZm9ybWF0IjogeyJqc29uX2NsYXNzIjogIlBhcnF1ZXRGb3JtYXQifX19LCB7ImZpbGUiOiB7ImZpZWxkX21hcHBpbmciOiB7fSwgImV2ZW50X3RpbWVzdGFtcF9jb2x1bW4iOiAiZGF0ZXRpbWUiLCAiY3JlYXRlZF90aW1lc3RhbXBfY29sdW1uIjogImNyZWF0ZWQiLCAiZGF0ZV9wYXJ0aXRpb25fY29sdW1uIjogImRhdGUiLCAicGF0aCI6ICJ3YXNiczovL2ZlYXN0dGVzdEBmZWFzdGRyaXZpbmdwb2MuYmxvYi5jb3JlLndpbmRvd3MubmV0L2RyaXZlcl90cmlwcyIsICJmb3JtYXQiOiB7Impzb25fY2xhc3MiOiAiUGFycXVldEZvcm1hdCJ9fX1d --entity-source eyJmaWxlIjogeyJmaWVsZF9tYXBwaW5nIjoge30sICJldmVudF90aW1lc3RhbXBfY29sdW1uIjogImV2ZW50X3RpbWVzdGFtcCIsICJjcmVhdGVkX3RpbWVzdGFtcF9jb2x1bW4iOiAiIiwgImRhdGVfcGFydGl0aW9uX2NvbHVtbiI6ICIiLCAicGF0aCI6ICJ3YXNiczovL2ZlYXN0dGVzdEBmZWFzdGRyaXZpbmdwb2MuYmxvYi5jb3JlLndpbmRvd3MubmV0L2FydGlmYWN0cy8xMjgyZDg1NC00YjkxLTQ2MDItOTg0OS0xMjQ3ZTUzYWFhMTkiLCAiZm9ybWF0IjogeyJqc29uX2NsYXNzIjogIlBhcnF1ZXRGb3JtYXQifX19 --destination eyJmb3JtYXQiOiAicGFycXVldCIsICJwYXRoIjogImFiZnNzOi8vZmVhc3Rwb2NAZmVhc3Rwb2MuZGZzLmNvcmUud2luZG93cy5uZXQvZmVhc3Qvb3V0LzA0OWJmMWU3LTNmYTctNGNlZi1iNzZiLTRlYmI3MDBjZDczMyJ9
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/spark/jars/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/spark/jars/feast-ingestion-spark-v0.2.17.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/opt/spark/jars/spark-unsafe_2.12-3.0.2.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
22/01/24 04:06:06 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.azure.NativeAzureFileSystem$Secure not found
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2595)
    at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3269)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3301)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
    at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1853)
    at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:737)
    at org.apache.spark.deploy.DependencyUtils$.downloadFile(DependencyUtils.scala:138)
    at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$8(SparkSubmit.scala:376)
    at scala.Option.map(Option.scala:230)
    at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:376)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.azure.NativeAzureFileSystem$Secure not found
    at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2499)
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2593)
    ... 19 more

Is there a way to set spark.jars.packages Spark configuration property or similar via FEAST environment variables? Tried modifying the SparkSession configuration in pyspark directly but that did not seem to work.

rramani commented 2 years ago

Hi @andrijaperovic , where are you running Spark?

@snowmanmsft fyi.

andrijaperovic commented 2 years ago

Hi @rramani I’m running spark on AKS 1.22 using spark-operator. However, this is lower priority issue since Synapse launcher is working fine.