Closed morazow closed 2 years ago
I have looked into this further.
The issue is with Hadoop Azure library that depends on the old jackson dependency.
172.21.0.2:54518> Caused by: java.lang.NoClassDefFoundError: org/codehaus/jackson/map/ObjectMapper
172.21.0.2:54518> org.apache.hadoop.fs.azurebfs.services.AbfsHttpOperation.parseListFilesResponse(AbfsHttpOperation.java:528)
172.21.0.2:54518> org.apache.hadoop.fs.azurebfs.services.AbfsHttpOperation.processResponse(AbfsHttpOperation.java:391)
172.21.0.2:54518> org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.executeHttpOperation(AbfsRestOperation.java:290)
172.21.0.2:54518> org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.completeExecute(AbfsRestOperation.java:217)
172.21.0.2:54518> org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.lambda$execute$0(AbfsRestOperation.java:191)
172.21.0.2:54518> org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDurationOfInvocation(IOStatisticsBinding.java:464)
172.21.0.2:54518> org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.execute(AbfsRestOperation.java:189)
172.21.0.2:54518> org.apache.hadoop.fs.azurebfs.services.AbfsClient.listPath(AbfsClient.java:302)
172.21.0.2:54518> org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.listStatus(AzureBlobFileSystemStore.java:1054)
172.21.0.2:54518> org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.listStatus(AzureBlobFileSystemStore.java:1024)
172.21.0.2:54518> org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.listStatus(AzureBlobFileSystemStore.java:1006)
172.21.0.2:54518> org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.listStatus(AzureBlobFileSystem.java:490)
172.21.0.2:54518> org.apache.spark.sql.delta.storage.HadoopFileSystemLogStore.listFrom(HadoopFileSystemLogStore.scala:83)
172.21.0.2:54518> org.apache.spark.sql.delta.SnapshotManagement.listFrom(SnapshotManagement.scala:62)
172.21.0.2:54518> org.apache.spark.sql.delta.SnapshotManagement.listFrom$(SnapshotManagement.scala:61)
172.21.0.2:54518> org.apache.spark.sql.delta.DeltaLog.listFrom(DeltaLog.scala:62)
There is an effort to replace older Jackson versions HADOOP-16908 (corresponding pull request PR 3789). But this will be included in the next 3.4.0
version.
For now, we are going to include org.codehaus.jackson:jackson-mapper-asl:1.9.13
and suppress vulnerabilities.
Import query to reproduce above exception:
IMPORT INTO TEST.TEST1
FROM SCRIPT CLOUD_STORAGE_EXTENSION.IMPORT_PATH WITH
BUCKET_PATH = 'abfss://container@storageaccount.dfs.core.windows.net/2m5/*'
DATA_FORMAT = 'DELTA'
CONNECTION_NAME = 'AZURE_ABFS_CONNECTION'
TRUNCATE_STRING = 'true'
PARALLELISM = 'nproc()*2';
Situation
We get the following error when reading from Azure Data Lake Gen2 storage using delta format.
It is because of the excluded
org.codehaus.jackson:jackson-mapper-asl:1.9.13
dependency. An the replacementcom.fasterxml.jackson.core:jackson-databind:2.13.1
is not used.Acceptance Criteria