Closed buinauskas closed 6 months ago
Hi,
Just checking: for you step2, did you evolve your case class of TranslationKey
or TranslationKey
then do the redeploy?
Hi @AlexLi-Monster, no, it's the same codebase recompiled and redeployed. I'm in the process of migrating from the deprecated Scala API and am testing various scenarios.
@buinauskas When you say "I need to make some changes to the already existing job", does it mean you switch Flink TypeInformations from Apache Flink Scala API to this lib Scala API?
The official Flink Scala API (it is deprecated already) stores own Serialisers in a savepoint/checkpoint which can NOT be read by this library, because each library uses own code to de/serialize, that is why you can see different hash codes in the error message.
Could you check if this article explains the problem and gives some solution to you? https://ververica.zendesk.com/hc/en-us/articles/10627965111068-How-to-migrate-a-savepoint-created-using-Apache-Flink-Scala-API-to-another-format
Hey, the change is not from one API to another.
The job is bootstrapped using this lib Scala API and there are no changes to the type information.
This exact exception happened by literally recompiling the same project and redeploying it to the same cluster from the snapshot.
I suspect this could be the issue with the fact that we run in session mode and class loading.
I'll go through your suggested article to see if I'm missing something, thank you.
Hm.... this could be indeed related to some different classes which are loaded in your Flink job/cluster runtime. There is definitely something changed in the environment that led to slightly different ScalaCaseClassSerializer object.
It is interesting case to solve and share with community. Let us know if you find the root cause.
@buinauskas you might want to review these documentation on class loading in Flink:
I would double check whether a session cluster is pre-built with different Scala dependencies than your job jar (uber jar?) or not.
I've came to this yesterday too but held on updating this issue, my session cluster is pre-built with:
FLINK_SCALA_API_VERSION=1.18.1_1.1.3
MAGNOLIA_VERSION=1.1.8
#Flink scala api
curl https://repo1.maven.org/maven2/org/flinkextended/flink-scala-api_$SCALA_MAJOR_VERSION/$FLINK_SCALA_API_VERSION/flink-scala-api_${SCALA_MAJOR_VERSION}-${FLINK_SCALA_API_VERSION}.jar --fail --output /opt/flink/lib/flink-scala-api_$SCALA_MAJOR_VERSION-${FLINK_SCALA_API_VERSION}.jar
curl https://repo1.maven.org/maven2/com/softwaremill/magnolia1_2/magnolia_$SCALA_MAJOR_VERSION/$MAGNOLIA_VERSION/magnolia_$SCALA_MAJOR_VERSION-$MAGNOLIA_VERSION.jar --fail --output /opt/flink/lib/magnolia_$SCALA_MAJOR_VERSION-$MAGNOLIA_VERSION.jar
Packaging only flink extended library has helped with the issue.
implementation "org.flinkextended:flink-scala-api_$scalaMajorVersion:$flinkScalaApiVersion"
flinkShadowJar("org.flinkextended:flink-scala-api_$scalaMajorVersion:$flinkScalaApiVersion") {
exclude group: 'org.apache.flink'
exclude group: 'org.scalameta'
exclude group: 'org.scala-lang'
}
But now there's a similar issue when I started the job classes are not sent over the wire correctly. I suspect similar problems and will investigate further.
Fascinating. This issue can be related to the Flink serialization approach itself. I guess because of two different class loaders (system classloader and user classloader), which are used in this situation, recovery procedure results into a different serializer object, so that Fllink cannot read a savepoint, because it thinks that the provided serializer is unkown.
Indeed that's what I also suspect, except I was calling this anything but fascinating.
I'll close the issue because it is not related to this library, thank you for pointing me into a right direction 🙏
I've encountered an issue when restoring a job from a savepoint, this happens when I need to make some changes to the already existing job, the procedure is as follows:
We're running Flink cluster in Kubernetes in session mode.
The classes I'm using as a state:
And the operator declares uses the state as follows, I've omitted implementation details because they don't matter:
Is there something I'm doing wrong? This works with deprecated scala API. The exception also says that the issue might be with heap state, which I'm using in production. If switching to RoksDB would solve the issue, I'd be glad to make the switch.