Alluxio / alluxio

Alluxio, data orchestration for analytics and machine learning in the cloud
https://www.alluxio.io
Apache License 2.0
6.79k stars 2.93k forks source link

Improve Master RPC capacity when use HDFS compatible system #17031

Open flaming-archer opened 1 year ago

flaming-archer commented 1 year ago

Is your feature request related to a problem? Please describe. When using Alluxio to accelerate Presto, we found that the Alluxio master rpc was heavily stacked.

Describe the solution you'd like

Describe alternatives you've considered Try to fix it myself and see where the performance bottleneck is full.

Urgency urgent

Additional context Alluxio cluster deployment, Speed up presto query.

flaming-archer commented 1 year ago

Data monitored at the beginning:

截屏2023-03-08 下午12 05 15 截屏2023-03-08 下午12 05 21
flaming-archer commented 1 year ago

I fixed with 3 prs : https://github.com/Alluxio/alluxio/pull/17006 https://github.com/Alluxio/alluxio/pull/16893 https://github.com/Alluxio/alluxio/pull/16944

Then the test results on our side:

截屏2023-03-08 下午12 09 45 截屏2023-03-08 下午12 10 31

I think the 3 prs could be merged. 😄 。@HelloHorizon @elega @jiacheliu3

The idea is that presto calls listStatus and getFileInfo do not need mountinfo information. When a directory has thousands of files and 500 mount points, it can reduce the number of calls to mount point resolution by at least hundreds of thousands.

Seen from the phenomenon, it has improved a lot of rpc processing capacity compared with the previous one.

flaming-archer commented 1 year ago

But the performance is still not enough, especially when Alluxio is just started. Because I found that alliuxio will synchronize the inode after restarting. This process is particularly slow, and it will also parse the mount point.

截屏2023-03-08 下午12 18 08
elega commented 1 year ago

Hi I can help the review.

flaming-archer commented 1 year ago

Hi I can help the review. Thx. I found that the reason why the master started to synchronize is that we configured to synchronize once a day. alluxio.user.file.metadata.sync.interval=1d. When the master is started, the cache has no data, and it will be synchronized according to this.

elega commented 1 year ago

FYI we are also working on a better mount point resolution using Trie. Also you can also try setting the sync interval to -1 and sync on your own on demand.

flaming-archer commented 1 year ago

@elega @jiacheliu3 This https://github.com/Alluxio/alluxio/pull/16944 also helps me to review. Those test cases failed. It should not be my problem.

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in two weeks if no further activity occurs. Thank you for your contributions.