offline_to_online_ingestion_job fails adding ingestion JAR to distributed cache

Encountering this error in the Livy logs of Synapse when running offline_to_online_ingestion job:

Exception in thread "main" java.lang.IllegalArgumentException: Attempt to add (abfss://feastpoc@feastpoc.dfs.core.windows.net/feast/feast-ingestion-spark-latest.jar) multiple times to the distributed cache.
    at org.apache.spark.deploy.yarn.Client.$anonfun$prepareLocalResources$23(Client.scala:633)
    at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
    at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
    at org.apache.spark.deploy.yarn.Client.$anonfun$prepareLocalResources$22(Client.scala:623)
    at org.apache.spark.deploy.yarn.Client.$anonfun$prepareLocalResources$22$adapted(Client.scala:622)
    at scala.collection.immutable.List.foreach(List.scala:392)
    at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:622)
    at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:913)
    at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:204)
    at org.apache.spark.deploy.yarn.Client.run(Client.scala:1261)
    at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1677)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:956)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:181)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:204)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1044)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1053)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Setting FEAST_SPARK_INGESTION_JAR environment variable using SAS-based connection string (https://feastdrivingpoc.blob.core.windows.net/feastjar/feast-ingestion-spark-latest.jar plus SAS signature), since public access is not allowed on the storage account by default:

job = feast_spark.Client(client).start_offline_to_online_ingestion(
File "/Users/i868602/.local/lib/python3.8/site-packages/feast_spark/client.py", line 299, in start_offline_to_online_ingestion
return start_offline_to_online_ingestion(
File "/Users/i868602/.local/lib/python3.8/site-packages/feast_spark/pyspark/launcher.py", line 352, in start_offline_to_online_ingestion
return launcher.offline_to_online_ingestion(
File "/Users/i868602/.local/lib/python3.8/site-packages/feast_spark/pyspark/launchers/synapse/synapse.py", line 240, in offline_to_online_ingestion
main_file = self._datalake.upload_file(ingestion_job_params.get_main_file_path())
File "/Users/i868602/.local/lib/python3.8/site-packages/feast_spark/pyspark/launchers/synapse/synapse_utils.py", line 229, in upload_file
with urllib.request.urlopen(local_file) as f:
File "/usr/local/anaconda3/envs/feast_poc/lib/python3.8/urllib/request.py", line 223, in urlopen
return opener.open(url, data, timeout)
File "/usr/local/anaconda3/envs/feast_poc/lib/python3.8/urllib/request.py", line 532, in open
response = meth(req, response)
File "/usr/local/anaconda3/envs/feast_poc/lib/python3.8/urllib/request.py", line 641, in http_response
response = self.parent.error(
File "/usr/local/anaconda3/envs/feast_poc/lib/python3.8/urllib/request.py", line 570, in error
return self._call_chain(*args)
File "/usr/local/anaconda3/envs/feast_poc/lib/python3.8/urllib/request.py", line 503, in _call_chain
result = func(*args)
File "/usr/local/anaconda3/envs/feast_poc/lib/python3.8/urllib/request.py", line 650, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 409: Public access is not permitted on this storage account.

Looks like this might be more of spark problem which has to do with duplicate JAR's in the Spark2 directory.

Azure / feast-azure

offline_to_online_ingestion_job fails adding ingestion JAR to distributed cache #46