Alluxio / alluxio

Alluxio, data orchestration for analytics and machine learning in the cloud
https://www.alluxio.io
Apache License 2.0
6.84k stars 2.94k forks source link

The worker's client resource pool is exhausted, causing a read-write deadlock #16413

Open yuyang733 opened 1 year ago

yuyang733 commented 1 year ago

Alluxio Version: Alluxio 2.4.0+

Describe the bug

alluxio.user.block.worker.client.pool.max is restricted to the 1024. When the file being written exceeds the size of 1024 blocks, all read and write operations from the client will be blocked.

I took a closer look at the code and found that each DataWriter holds a connection related to a BlockOutStream.

截屏2022-10-29 02 16 18 截屏2022-10-29 02 23 09

However, during the writing process of the large file, BlockOutStream is not released, but cached to mPreviousBlockOutStreams as the following code segment:

截屏2022-10-29 02 28 47

And it would not be released until the file is closed.

截屏2022-10-29 02 29 39

So, I'm curious why it's designed like this.

To Reproduce

You can write a large file greater than blockSize * 1024.

Expected behavior DeadLock for 100 days.

Urgency

Very urgent!

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in two weeks if no further activity occurs. Thank you for your contributions.