Describe the bug
some alluxio worker crash,and files in the crashed worker are marked as missing,and after some times,the worker is back online,but the files will always be lost with 100% in alluxio and will never persist to ceph
such as follows:
To Reproduce
stop one alluxio worker,and when the files are lost,put the alluxio worker to online
Expected behavior
when the files are lost,but 100% in alluxio,it should be turn to to_be_persist status,and persist to ceph.
Are you planning to fix it
in process
Searching the project,I found there are some logic to turn files to LOST in the class LostFileDetector,
if (inode.getPersistenceState() != PersistenceState.PERSISTED) {
mInodeTree.updateInode(journalContext, UpdateInodeEntry.newBuilder()
.setId(inode.getId())
.setPersistenceState(PersistenceState.LOST.name())
.build());
but there is never some logic which turns files form LOST TO NOT_PERSISTED,cause the worker has been back online.
I'm not sure if it is suitable to put logic which turns files from LOST to PersistenceState.NOT_PERSISTED int just this class,or put this logic somewhere else will be more suitable?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in two weeks if no further activity occurs. Thank you for your contributions.
Alluxio Version: alluxio 2.6.0,alluxio 2.6.1,alluxio 2.8.0
Describe the bug some alluxio worker crash,and files in the crashed worker are marked as missing,and after some times,the worker is back online,but the files will always be lost with 100% in alluxio and will never persist to ceph such as follows:
To Reproduce stop one alluxio worker,and when the files are lost,put the alluxio worker to online
Expected behavior when the files are lost,but 100% in alluxio,it should be turn to to_be_persist status,and persist to ceph.
Are you planning to fix it in process
Searching the project,I found there are some logic to turn files to LOST in the class LostFileDetector, if (inode.getPersistenceState() != PersistenceState.PERSISTED) { mInodeTree.updateInode(journalContext, UpdateInodeEntry.newBuilder() .setId(inode.getId()) .setPersistenceState(PersistenceState.LOST.name()) .build());
but there is never some logic which turns files form LOST TO NOT_PERSISTED,cause the worker has been back online. I'm not sure if it is suitable to put logic which turns files from LOST to PersistenceState.NOT_PERSISTED int just this class,or put this logic somewhere else will be more suitable?