apache / amoro

Apache Amoro (incubating) is a Lakehouse management system built on open data lake formats.
https://amoro.apache.org/
Apache License 2.0
762 stars 267 forks source link

[Hotfix] Meet NPE when loading task runtimes #2999

Closed tcodehuber closed 1 week ago

tcodehuber commented 1 week ago

Why are the changes needed?

If we upgrade AMS but there are tables whose status is running, AMS can not start normally for the deserialization error. The root cause is that TableRuntime#completeProcessL283 optimizingProcess.getTargetSnapshotId() optimizingProcess has not been initialized yet. Exception stack:

Caused by: java.io.InvalidClassException: org.apache.amoro.table.TableMetaStore; local class incompatible: stream classdesc serialVersionUID = -7257151990323524190, local class serialVersionUID = 8582766959166530544
    at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:699) ~[?:1.8.0_412]
    at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2005) ~[?:1.8.0_412]
    at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1852) ~[?:1.8.0_412]
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2186) ~[?:1.8.0_412]
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) ~[?:1.8.0_412]
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) ~[?:1.8.0_412]
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) ~[?:1.8.0_412]
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) ~[?:1.8.0_412]
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) ~[?:1.8.0_412]
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) ~[?:1.8.0_412]
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) ~[?:1.8.0_412]
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) ~[?:1.8.0_412]
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) ~[?:1.8.0_412]
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431) ~[?:1.8.0_412]
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355) ~[?:1.8.0_412]
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) ~[?:1.8.0_412]
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) ~[?:1.8.0_412]
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503) ~[?:1.8.0_412]
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461) ~[?:1.8.0_412]
    at java.util.HashMap.readObject(HashMap.java:1418) ~[?:1.8.0_412]
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_412]
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_412]
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_412]
    at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_412]
    at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1184) ~[?:1.8.0_412]
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2322) ~[?:1.8.0_412]
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213) ~[?:1.8.0_412]
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669) ~[?:1.8.0_412]
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503) ~[?:1.8.0_412]
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461) ~[?:1.8.0_412]
    at org.apache.amoro.utils.SerializationUtil.simpleDeserialize(SerializationUtil.java:71) ~[amoro-core-0.7-SNAPSHOT.jar:0.7-SNAPSHOT]
    ... 15 more
2024-07-03 05:13:37,646 ERROR [main] [org.apache.amoro.server.AmoroServiceContainer] [] - AMS start error
org.apache.amoro.server.exception.PersistenceException: java.lang.NullPointerException
    at org.apache.amoro.server.exception.AmoroRuntimeException.wrap(AmoroRuntimeException.java:141) ~[amoro-ams-server-0.7-SNAPSHOT.jar:0.7-SNAPSHOT]
    at org.apache.amoro.server.persistence.PersistentBase.doAsTransaction(PersistentBase.java:65) ~[amoro-ams-server-0.7-SNAPSHOT.jar:0.7-SNAPSHOT]
    at org.apache.amoro.server.persistence.StatedPersistentBase.invokeConsistency(StatedPersistentBase.java:51) ~[amoro-ams-server-0.7-SNAPSHOT.jar:0.7-SNAPSHOT]
    at org.apache.amoro.server.table.TableRuntime.completeProcess(TableRuntime.java:280) ~[amoro-ams-server-0.7-SNAPSHOT.jar:0.7-SNAPSHOT]
    at org.apache.amoro.server.optimizing.OptimizingQueue$TableOptimizingProcess.lambda$persistProcessCompleted$12(OptimizingQueue.java:655) ~[amoro-ams-server-0.7-SNAPSHOT.jar:0.7-SNAPSHOT]
    at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948) ~[?:1.8.0_412]
    at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:647) ~[?:1.8.0_412]
    at org.apache.amoro.server.persistence.PersistentBase.doAsTransaction(PersistentBase.java:61) ~[amoro-ams-server-0.7-SNAPSHOT.jar:0.7-SNAPSHOT]
    at org.apache.amoro.server.optimizing.OptimizingQueue.access$400(OptimizingQueue.java:73) ~[amoro-ams-server-0.7-SNAPSHOT.jar:0.7-SNAPSHOT]
    at org.apache.amoro.server.optimizing.OptimizingQueue$TableOptimizingProcess.persistProcessCompleted(OptimizingQueue.java:643) ~[amoro-ams-server-0.7-SNAPSHOT.jar:0.7-SNAPSHOT]
    at org.apache.amoro.server.optimizing.OptimizingQueue$TableOptimizingProcess.close(OptimizingQueue.java:422) ~[amoro-ams-server-0.7-SNAPSHOT.jar:0.7-SNAPSHOT]
    at org.apache.amoro.server.optimizing.OptimizingQueue$TableOptimizingProcess.loadTaskRuntimes(OptimizingQueue.java:687) ~[amoro-ams-server-0.7-SNAPSHOT.jar:0.7-SNAPSHOT]
    at org.apache.amoro.server.optimizing.OptimizingQueue$TableOptimizingProcess.<init>(OptimizingQueue.java:392) ~[amoro-ams-server-0.7-SNAPSHOT.jar:0.7-SNAPSHOT]
    at org.apache.amoro.server.optimizing.OptimizingQueue.initTableRuntime(OptimizingQueue.java:116) ~[amoro-ams-server-0.7-SNAPSHOT.jar:0.7-SNAPSHOT]
    at java.util.ArrayList.forEach(ArrayList.java:1259) ~[?:1.8.0_412]
    at org.apache.amoro.server.optimizing.OptimizingQueue.<init>(OptimizingQueue.java:109) ~[amoro-ams-server-0.7-SNAPSHOT.jar:0.7-SNAPSHOT]
    at org.apache.amoro.server.DefaultOptimizingService.lambda$loadOptimizingQueues$0(DefaultOptimizingService.java:141) ~[amoro-ams-server-0.7-SNAPSHOT.jar:0.7-SNAPSHOT]
    at java.util.ArrayList.forEach(ArrayList.java:1259) ~[?:1.8.0_412]
    at org.apache.amoro.server.DefaultOptimizingService.loadOptimizingQueues(DefaultOptimizingService.java:131) ~[amoro-ams-server-0.7-SNAPSHOT.jar:0.7-SNAPSHOT]
    at org.apache.amoro.server.DefaultOptimizingService.access$300(DefaultOptimizingService.java:87) ~[amoro-ams-server-0.7-SNAPSHOT.jar:0.7-SNAPSHOT]
    at org.apache.amoro.server.DefaultOptimizingService$TableRuntimeHandlerImpl.initHandler(DefaultOptimizingService.java:460) ~[amoro-ams-server-0.7-SNAPSHOT.jar:0.7-SNAPSHOT]
    at org.apache.amoro.server.table.RuntimeHandlerChain.initialize(RuntimeHandlerChain.java:60) ~[amoro-ams-server-0.7-SNAPSHOT.jar:0.7-SNAPSHOT]
    at org.apache.amoro.server.table.DefaultTableService.initialize(DefaultTableService.java:364) ~[amoro-ams-server-0.7-SNAPSHOT.jar:0.7-SNAPSHOT]
    at org.apache.amoro.server.AmoroServiceContainer.startService(AmoroServiceContainer.java:150) ~[amoro-ams-server-0.7-SNAPSHOT.jar:0.7-SNAPSHOT]
    at org.apache.amoro.server.AmoroServiceContainer.main(AmoroServiceContainer.java:108) [amoro-ams-server-0.7-SNAPSHOT.jar:0.7-SNAPSHOT]

Brief change log

-

How was this patch tested?

Documentation

zhoujinsong commented 1 week ago

Hi, @tcodehuber

Can you help to provide more error stack information about where the null exception is thrown? I believe it can help us to understand why the change can fix this issue.

tcodehuber commented 1 week ago

Hi, @tcodehuber

Can you help to provide more error stack information about where the null exception is thrown? I believe it can help us to understand why the change can fix this issue.

amoro-ams.log

@zhoujinsong , I have tested in our env, NPE occurs here: image