apache / ignite

Apache Ignite
https://ignite.apache.org/
Apache License 2.0
4.73k stars 1.89k forks source link

Apache Ignite stops working after running for a week #11397

Open vinaygangaraj opened 1 month ago

vinaygangaraj commented 1 month ago

I am using Apache Ignite (2.15.0) in .NET6 project to run Ignite cache on a single partition node. It was running fine without any issues but after 7 days of runtime it stopped automatically and when I try to start Ignite cache I see below errors in log file.

The only workaround I found is to delete "Cache-Storage" folder (mentioned in the below error details) and start Ignite cache, then it works fine. Any help in resolving this issue will be greatly appreciated. Thank you!

[10:20:11,026][INFO][main][CheckpointMarkersStorage] Read checkpoint status [startMarker=C:\Cache-Storage\node00-34f88250-59aa-4aee-9f9f-aaf8f11cc8aa\cp\1726231973291-a70928e6-62e4-4152-96e6-e03ceb7493d4-START.bin, endMarker=C:\Cache-Storage\node00-34f88250-59aa-4aee-9f9f-aaf8f11cc8aa\cp\1726231973291-a70928e6-62e4-4152-96e6-e03ceb7493d4-END.bin]
[10:20:11,026][INFO][main][PageMemoryImpl] Started page memory [memoryAllocated=100.0 MiB, pages=24804, tableSize=1.9 MiB, replacementSize=3.1 KiB, checkpointBuffer=100.0 MiB]
[10:20:11,026][INFO][main][GridCacheDatabaseSharedManager] Checking memory state [lastValidPos=WALPointer [idx=103, fileOff=46331083, len=50617], lastMarked=WALPointer [idx=103, fileOff=46331083, len=50617], lastCheckpointId=a70928e6-62e4-4152-96e6-e03ceb7493d4]
[10:20:11,026][INFO][main][FilePageStoreManager] Cleanup cache stores [total=0, left=0, cleanFiles=false]
[10:20:11,026][SEVERE][main][IgniteKernal] Exception during start processors, node will be stopped and close connections
class org.apache.ignite.IgniteCheckedException: WAL history is too short [descs=[FileDescriptor [file=C:\Cache-Storage\db\wal\archive\node00-34f88250-59aa-4aee-9f9f-aaf8f11cc8aa\0000000000000118.wal, idx=118], FileDescriptor [file=C:\Cache-Storage\db\wal\archive\node00-34f88250-59aa-4aee-9f9f-aaf8f11cc8aa\0000000000000119.wal, idx=119], FileDescriptor [file=C:\Cache-Storage\db\wal\archive\node00-34f88250-59aa-4aee-9f9f-aaf8f11cc8aa\0000000000000120.wal, idx=120], FileDescriptor [file=C:\Cache-Storage\db\wal\archive\node00-34f88250-59aa-4aee-9f9f-aaf8f11cc8aa\0000000000000121.wal, idx=121], FileDescriptor [file=C:\Cache-Storage\db\wal\archive\node00-34f88250-59aa-4aee-9f9f-aaf8f11cc8aa\0000000000000122.wal, idx=122], FileDescriptor [file=C:\Cache-Storage\db\wal\archive\node00-34f88250-59aa-4aee-9f9f-aaf8f11cc8aa\0000000000000123.wal, idx=123], FileDescriptor [file=C:\Cache-Storage\db\wal\archive\node00-34f88250-59aa-4aee-9f9f-aaf8f11cc8aa\0000000000000124.wal, idx=124], FileDescriptor [file=C:\Cache-Storage\db\wal\archive\node00-34f88250-59aa-4aee-9f9f-aaf8f11cc8aa\0000000000000125.wal, idx=125]], start=WALPointer [idx=103, fileOff=46331083, len=50617]]
    at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$RecordsIterator.init(FileWriteAheadLogManager.java:3039)
    at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$RecordsIterator.access$1000(FileWriteAheadLogManager.java:2911)
    at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.replay(FileWriteAheadLogManager.java:1082)
    at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.replay(FileWriteAheadLogManager.java:1049)
    at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.read(FileWriteAheadLogManager.java:1037)
    at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.performBinaryMemoryRestore(GridCacheDatabaseSharedManager.java:2119)
    at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readMetastore(GridCacheDatabaseSharedManager.java:865)
    at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetaStorageSubscribersOnReadyForRead(GridCacheDatabaseSharedManager.java:3094)
    at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1120)
    at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1725)
    at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1647)
    at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1089)
    at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:599)
    at org.apache.ignite.internal.processors.platform.PlatformAbstractBootstrap.start(PlatformAbstractBootstrap.java:43)
    at org.apache.ignite.internal.processors.platform.PlatformIgnition.start(PlatformIgnition.java:74)
[10:20:11,026][SEVERE][main][IgniteKernal] Got exception while starting (will rollback startup routine).
class org.apache.ignite.IgniteCheckedException: WAL history is too short [descs=[FileDescriptor [file=C:\Cache-Storage\db\wal\archive\node00-34f88250-59aa-4aee-9f9f-aaf8f11cc8aa\0000000000000118.wal, idx=118], FileDescriptor [file=C:\Cache-Storage\db\wal\archive\node00-34f88250-59aa-4aee-9f9f-aaf8f11cc8aa\0000000000000119.wal, idx=119], FileDescriptor [file=C:\Cache-Storage\db\wal\archive\node00-34f88250-59aa-4aee-9f9f-aaf8f11cc8aa\0000000000000120.wal, idx=120], FileDescriptor [file=C:\Cache-Storage\db\wal\archive\node00-34f88250-59aa-4aee-9f9f-aaf8f11cc8aa\0000000000000121.wal, idx=121], FileDescriptor [file=C:\Cache-Storage\db\wal\archive\node00-34f88250-59aa-4aee-9f9f-aaf8f11cc8aa\0000000000000122.wal, idx=122], FileDescriptor [file=C:\Cache-Storage\db\wal\archive\node00-34f88250-59aa-4aee-9f9f-aaf8f11cc8aa\0000000000000123.wal, idx=123], FileDescriptor [file=C:\Cache-Storage\db\wal\archive\node00-34f88250-59aa-4aee-9f9f-aaf8f11cc8aa\0000000000000124.wal, idx=124], FileDescriptor [file=C:\Cache-Storage\db\wal\archive\node00-34f88250-59aa-4aee-9f9f-aaf8f11cc8aa\0000000000000125.wal, idx=125]], start=WALPointer [idx=103, fileOff=46331083, len=50617]]
    at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$RecordsIterator.init(FileWriteAheadLogManager.java:3039)
    at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$RecordsIterator.access$1000(FileWriteAheadLogManager.java:2911)
    at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.replay(FileWriteAheadLogManager.java:1082)
    at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.replay(FileWriteAheadLogManager.java:1049)
    at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.read(FileWriteAheadLogManager.java:1037)
    at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.performBinaryMemoryRestore(GridCacheDatabaseSharedManager.java:2119)
    at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readMetastore(GridCacheDatabaseSharedManager.java:865)
    at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetaStorageSubscribersOnReadyForRead(GridCacheDatabaseSharedManager.java:3094)
    at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1120)
    at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1725)
    at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1647)
    at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1089)
    at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:599)
    at org.apache.ignite.internal.processors.platform.PlatformAbstractBootstrap.start(PlatformAbstractBootstrap.java:43)
    at org.apache.ignite.internal.processors.platform.PlatformIgnition.start(PlatformIgnition.java:74)
[10:20:11,041][WARNING][main][IgniteKernal] Attempt to stop starting grid. This operation cannot be guaranteed to be successful.
[10:20:11,045][INFO][main][GridTcpRestProtocol] Command protocol successfully stopped: TCP binary
[10:20:11,045][INFO][main][FilePageStoreManager] Cleanup cache stores [total=0, left=0, cleanFiles=false]
[10:20:11,059][INFO][main][IgniteKernal] 

>>> +----------------------------------------------------------------------------------+
>>> Ignite ver. 2.15.0#20230425-sha1:f98f7f35de6dc76a9b69299154afaa2139a5ec6d stopped OK
>>> +----------------------------------------------------------------------------------+
>>> Grid uptime: 00:00:03.654
shishkovilja commented 3 weeks ago

@vinaygangaraj , it seems, that some files was deleted. Can you attach log of initial node failure?