Alluxio / alluxio

Alluxio, data orchestration for analytics and machine learning in the cloud
https://www.alluxio.io
Apache License 2.0
6.82k stars 2.93k forks source link

It's slow to read from Aliyun OSS to a worker when reading data for the first time. #13833

Open jingcheng88 opened 3 years ago

jingcheng88 commented 3 years ago

Alluxio Version: 2.6.0

Describe the bug Download file size: 1.6G

Download file directly from OSS: About 25s

Download from Alluxio without cache: Abount 135S

Download from Alluxio with cache: Abount 9S

apc999 commented 3 years ago

too few information is available here Can you provide us more details? e.g., how to reproduce your tests

jingcheng88 commented 3 years ago

I use Python S3 client(boto) to download files from Alluxio proxy. It's very slow at the first time. The UFS is aliyun OSS.

apc999 commented 3 years ago

Python S3 client is not actively maintained today. Can you try using Alluxio Java client or CLIs and measure the speed?

jingcheng88 commented 3 years ago

@apc999 Do you mean alluxio proxy is not actively maintained? I think python s3 client just send a request to alluxio proxy. Then proxy fetch files from alluxio worker as a alluxio client. I have tried use fuse client. It is also slow at the first time but faster than proxy.

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in two weeks if no further activity occurs. Thank you for your contributions.

TommyLike commented 11 months ago

@jingcheng88 any updates on this issue?

github-actions[bot] commented 10 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in two weeks if no further activity occurs. Thank you for your contributions.