atsign-foundation / at_server

The software implementation of Atsign's core technology
https://docs.atsign.com
BSD 3-Clause "New" or "Revised" License
40 stars 12 forks source link

Exception on Prod secondary #591

Closed kumarnarendra701 closed 2 years ago

kumarnarendra701 commented 2 years ago

Describe the bug Observed "FileSystemException: lock failed" exception on prod secondaries. Atsign - @starlingelectoral (3775)

Logs -

FileSystemException: lock failed, path = 'storage/commitLog/commit_log_07c5c43c9946f85a84f695d4805915415539ca79f40798fb73e8905d949c3cc1.lock' (OS Error: Resource temporarily unavailable, errno = 11)
#0      _RandomAccessFile.lock.<anonymous closure> (dart:io/file_impl.dart:1002)
<asynchronous suspension>
#1      StorageBackendVm.initialize (package:hive/src/backend/vm/storage_backend_vm.dart:81)
<asynchronous suspension>
#2      HiveImpl._openBox (package:hive/src/hive_impl.dart:110)
<asynchronous suspension>
#3      HiveImpl.openBox (package:hive/src/hive_impl.dart:140)
<asynchronous suspension>
#4      HiveBase.openBox (package:at_persistence_secondary_server/src/keystore/hive_base.dart:34)
<asynchronous suspension>
#5      CommitLogKeyStore.initialize (package:at_persistence_secondary_server/src/log/commitlog/commit_log_keystore.dart:36)
<asynchronous suspension>
#6      HiveBase.init (package:at_persistence_secondary_server/src/keystore/hive_base.dart:15)
<asynchronous suspension>
#7      AtCommitLogManagerImpl.getCommitLog (package:at_persistence_secondary_server/src/log/commitlog/at_commit_log_manager_impl.dart:27)
<asynchronous suspension>
#8      AtSecondaryServerImpl._initializePersistentInstances (package:at_secondary/src/server/at_secondary_impl.dart:372)
<asynchronous suspension>
#9      AtSecondaryServerImpl.start (package:at_secondary/src/server/at_secondary_impl.dart:136)
<asynchronous suspension>
#10     SecondaryServerBootStrapper.run (package:at_secondary/src/server/bootstrapper.dart:50)
<asynchronous suspension>
#11     main (file:///app/at_secondary_server/bin/main.dart:19)
<asynchronous suspension>
kumarnarendra701 commented 2 years ago

As suggested by @kalluriramkumar deleted .lock file and performed scale down and up but still getting the same logs and it's again recreating the .lock file. So to stop noise here, I have scaled down to zero.

gkc commented 2 years ago

I scaled service down; removed all of the hive .lock files; scaled service back up; all is well