Open XuankuF opened 3 years ago
@loovia .alluxio_ufs_persistence
is a staging directory for async-through writes via Alluxio, before the data is persisted in UFS. Which UFS type are you using? and do you see temp files stated there?
in addition, how did you define your table on hive and how did you drop the table? Say, if hive table points to files in UFS (say HDFS) rather than Alluxio, you only get the table removed from UFS, you may see files left over in Alluxio. I am asking just for curiosity.
my all tables location is 'alluxio://path/to/table', because database path was mounted in alluxio.
I often saw individual files that were not persisted in UFS(HDFS) when I use ASYNC_THROUGH
. And temp files exit in .alluxio_ufs_persistence
After the table generated. Next to I will detele table. After a few days, .alluxio_ufs_persistence
becomes like this.
A few days ago, I set alluxio.user.file.replication.max=0
and use CACHE_THROUGH
, the storage capacity of workers seem to have met expectations. I mean when alluxio.user.file.replication.max=-1
, the replication will not be deleted when the table is deleted(this is my guess)
Once the table is deleted, the corresponding partition files in Alluxio and the associated blocks should be removed at the same time.
Can you share your master.log
file and I would like to see if there are any failures when removing these temp files on master side and the reason for that.
@loovia is it possible you restarted your masters (perhaps a few times) before these async writes complete?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in two weeks if no further activity occurs. Thank you for your contributions.
Alluxio Version: 2.4.1-2
Describe the bug We store the temporary table data in Aluxio and drop table after the job is done. After a few days, the worker storage occupies 50%, it's unexpected. whta is
.alluxio_ufs_persistence
path? it takes up more than 400G and has 26220 files