apache / druid

Apache Druid: a high performance real-time analytics database.
https://druid.apache.org/
Apache License 2.0
13.39k stars 3.68k forks source link

Temporary working folders are left behind on Middle Managers after tasks complete #12332

Open sergioferragut opened 2 years ago

sergioferragut commented 2 years ago

Affected Version

Apache Druid 0.22.1

Description

This problem was originally reported here: https://www.druidforum.org/t/temp-folder-size-was-increasing-due-to-that-peons-processing-taking-more-time-how-to-clear-temp-folder-automatically/7139

I was able to reproduce it by running on a small minikube deployment by running the vanilla wikipedia index_parallel ingestion a few times, each with a different target datasource name and confirmed that after the jobs completed the temporary folders for the tasks are not being removed, after 3 runs, the ~/var/tmp folder still contained the three empty folders:

~/var/tmp $ ls -l
total 12
drwx------    2 druid    druid         4096 Mar 14 23:39 druid-realtime-persist1040350100896362009
drwx------    2 druid    druid         4096 Mar 14 23:32 druid-realtime-persist668375622911252079
drwx------    2 druid    druid         4096 Mar 14 23:34 druid-realtime-persist944793843865837077
~/var/tmp $ ls -l druid-realtime-persist944793843865837077
total 0
~/var/tmp $ ls -l druid-realtime-persist668375622911252079
total 0
~/var/tmp $ ls -l druid-realtime-persist1040350100896362009
total 0

The original report on Druid Forum spoke of thousands of such folders left behind.

lejinghu commented 2 years ago

We saw this too in our clusters. Also timed out queries are also leaving tmp folders. As a workaround we are cleaning them manually using cron jobs.

github-actions[bot] commented 9 months ago

This issue has been marked as stale due to 280 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions.

github-actions[bot] commented 8 months ago

This issue has been closed due to lack of activity. If you think that is incorrect, or the issue requires additional review, you can revive the issue at any time.

asdf2014 commented 8 months ago

I recommend reopening this issue, as I've also encountered such problem, which could lead to the failure of ingestion task if the disk space is fully, and this could be a significant concern that this belongs to the resource leak problem :sweat_smile:

asdf2014 commented 8 months ago

Hi @sergioferragut , have you had a chance to check the ~/var/druid/task/ dir? I find many outdated single_phase_sub_task_xxx directories with druid-input-entity-xxx.tmp file, which is worse than tmp folders..

asdf2014 commented 8 months ago

@abhishekagarwal87 Do you have any idea on this one :smile:

abhishekagarwal87 commented 7 months ago

What version are you on? I don't see such folders on my local box. Can you post your ingestion spec that you are running?

asdf2014 commented 7 months ago

Hi @abhishekagarwal87 , same as the version that @sergioferragut mentioned in this issue, yes, this indeed is a very low probability event. Now that we are using the MoK mode with the latest version of Druid, this issue no longer affects us :sweat_smile: