Alluxio / alluxio

Alluxio, data orchestration for analytics and machine learning in the cloud
https://www.alluxio.io
Apache License 2.0
6.87k stars 2.94k forks source link

Alluxio Proxy hangs up #17308

Open Haoning-Sun opened 1 year ago

Haoning-Sun commented 1 year ago

Alluxio Version: 2.7.1

Describe the bug Service hangs up when uploading large number of files. Because when writing a file, the BlockWorkerClient resource will be applied for when the OutputStream is created, and the BlockWorkerClient may also be applied for during the writing process. If all the writing threads only create OutputStream, and there is no BlockWorkerClient in the client pool to obtain when writing files, the writing threads are all waiting for available clients and cannot exit.

To Reproduce

Expected behavior Can upload files.

Urgency It can be fixed temporarily by increasing alluxio.user.block.worker.client.pool.max.

Are you planning to fix it Thinking about how to apply for all client resources at once.

Additional context The state of most threads is TIMED_WAITING and they are block at DynamicResourcePool.acquire

"qtp293002476-1570434" #1570434 prio=5 os_prio=0 tid=0x0000562008d6b800 nid=0x4e504 waiting on condition [0x00007fa8c2fd8000]
   java.lang.Thread.State: TIMED_WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x0000000371847a80> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163)
        at alluxio.resource.DynamicResourcePool.acquire(DynamicResourcePool.java:345)
        at alluxio.resource.DynamicResourcePool.acquire(DynamicResourcePool.java:292)
        at alluxio.client.file.FileSystemContext.acquireBlockWorkerClientInternal(FileSystemContext.java:551)
        at alluxio.client.file.FileSystemContext.acquireBlockWorkerClient(FileSystemContext.java:529)
        at alluxio.client.block.stream.GrpcDataWriter.create(GrpcDataWriter.java:97)
        at alluxio.client.block.stream.DataWriter$Factory.create(DataWriter.java:94)
        at alluxio.client.block.stream.BlockOutStream.createReplicatedBlockOutStream(BlockOutStream.java:97)
        at alluxio.client.block.AlluxioBlockStore.getOutStream(AlluxioBlockStore.java:371)
        at alluxio.client.file.AlluxioFileOutStream.getNextBlock(AlluxioFileOutStream.java:301)
        at alluxio.client.file.AlluxioFileOutStream.writeInternal(AlluxioFileOutStream.java:267)
        at alluxio.client.file.AlluxioFileOutStream.write(AlluxioFileOutStream.java:234)
        at java.security.DigestOutputStream.write(DigestOutputStream.java:145)
        at com.google.common.io.ByteStreams.copy(ByteStreams.java:116)
        at alluxio.proxy.s3.S3RestServiceHandler.lambda$createObjectOrUploadPart$12(S3RestServiceHandler.java:983)
        at alluxio.proxy.s3.S3RestServiceHandler$$Lambda$712/154007767.call(Unknown Source)
        at alluxio.proxy.s3.S3RestUtils.call(S3RestUtils.java:100)
        at alluxio.proxy.s3.S3RestServiceHandler.createObjectOrUploadPart(S3RestServiceHandler.java:819)
        at sun.reflect.GeneratedMethodAccessor85.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498).
yyongycy commented 1 year ago

wondering if the s3 proxy * number of replica <= number of max client pool size works?

Haoning-Sun commented 1 year ago

wondering if the s3 proxy * number of replica <= number of max client pool size works?

It may be ralated to #17164. I have migrated this code. And I have paid attention to the impact of the number of threads and the number of clients. It would be better to be able to apply for all resources at once.

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in two weeks if no further activity occurs. Thank you for your contributions.