Open lucasmo opened 3 months ago
https://github.com/apache/hudi/issues/11378 appears to be caused by this same issue
Here is a reproducer script:
#!/usr/bin/env bash
MAVEN="https://repo1.maven.org/maven2"
ARTIFACTS="\
org/apache/avro/avro/1.11.3/avro-1.11.3.jar \
com/fasterxml/jackson/core/jackson-core/2.17.1/jackson-core-2.17.1.jar \
com/fasterxml/jackson/core/jackson-databind/2.17.1/jackson-databind-2.17.1.jar \
com/fasterxml/jackson/core/jackson-annotations/2.17.1/jackson-annotations-2.17.1.jar \
org/slf4j/slf4j-api/2.0.9/slf4j-api-2.0.9.jar \
org/apache/hudi/hudi-common/0.14.0/hudi-common-0.14.0.jar \
"
CLASSPATH=""
for artifact in $ARTIFACTS; do
curl -O "${MAVEN}/${artifact}"
jar=$(basename "$artifact")
CLASSPATH="${CLASSPATH}:${jar}"
done
echo $CLASSPATH
echo 'org.apache.avro.Schema schema = new org.apache.avro.Schema.Parser().parse("{\"type\":\"record\",\"name\":\"HoodieCleanPartitionMetadata\",\"namespace\":\"org.apache.hudi.avro.model\",\"fields\":[{\"name\":\"partitionPath\",\"type\":{\"type\":\"string\",\"avro.java.string\":\"String\"}},{\"name\":\"policy\",\"type\":{\"type\":\"string\",\"avro.java.string\":\"String\"}},{\"name\":\"deletePathPatterns\",\"type\":{\"type\":\"array\",\"items\":{\"type\":\"string\",\"avro.java.string\":\"String\"}}},{\"name\":\"successDeleteFiles\",\"type\":{\"type\":\"array\",\"items\":{\"type\":\"string\",\"avro.java.string\":\"String\"}}},{\"name\":\"failedDeleteFiles\",\"type\":{\"type\":\"array\",\"items\":{\"type\":\"string\",\"avro.java.string\":\"String\"}}},{\"name\":\"isPartitionDeleted\",\"type\":[\"null\",\"boolean\"],\"default\":null}]}"); System.out.println("Class for schema: " + org.apache.avro.specific.SpecificData.get().getClass(schema));' |\
jshell --class-path "${CLASSPATH}"
due to build profiles varying wrt spark and flink profiles, we don't expect hudi-common jars in the maven repo to be used for all spark/flink versions which changes avro versions over time, which causes compatibility issues. We expect people only use hudi bundle jars like hudi-spark3.5-bundle, hudi-utilities-slim-bundle, hudi-flink1.18-bundle, etc
@xushiyan understood. I am not an xtable developer. However, it seems pretty clear that the issue is with corrupted classes, not a spark version.
I have asked the XTable devs in the linked ticket to comment here. I'm not sure what I can do to make this move forward.
@lucasmo we should be able to fix this in 0.16.0 (tracking in https://issues.apache.org/jira/browse/HUDI-8028)
In the meantime, if you want the hudi-common jar to work, you may build the project with spark 3.4 or 3.5 profile, which will produce a hudi common jar that includes a compatible avro dependency for your spark version (assume you're using spark 3.4 or 3.5)
Describe the problem you faced
When diagnosing a problem with XTable (see https://github.com/apache/incubator-xtable/issues/466), I noticed that avro classes were unable to even be instantiated for schema in a very simple test case when using
hudi-common-0.14.0
as a dependency.However, this issue does not exist when using
hudi-spark3.4-bundle_2.12-0.14.0
as a dependency, which contains the same avro autogenerated classes. A good specific example isorg/apache/hudi/avro/model/HoodieCleanPartitionMetadata.class
.When compiling hudi locally (tag
release-0.14.0
,mvn clean package -DskipTests -Dspark3.4
, java 1.8), both generated jar files have the correct implementations of avro autogenerated classes.To Reproduce
Steps to reproduce the behavior:
org/apache/hudi/avro/model/HoodieCleanPartitionMetadata.class
in all four of the jarsOR
run the following in Java 11, replacing $PATH_TO_A_HOODIE_AVRO_MODELS_JAR with a path to one of the four jar files
Then, copy and paste this into the shell:
On the MavenCentral hudi-common-0.14.0 jar, you should get:
Expected behavior
The above code snippet prints
Environment Description
everything else n/a, but duplicated issue on macOS and Ubuntu 22.04.