Alluxio / alluxio

Alluxio, data orchestration for analytics and machine learning in the cloud
https://www.alluxio.io
Apache License 2.0
6.87k stars 2.94k forks source link

About direct io option for Alluxio Fuse #12718

Closed DdMad closed 1 year ago

DdMad commented 3 years ago

Page https://docs.alluxio.io/os/user/stable/en/api/POSIX-API.html

Summary I found this in the document:

_"Note that the directio mount option is set by default so that writes and reads bypass the kernel page cache and go directly to Alluxio."

So what I understand is: by default, whenever we read data from alluxio fuse, it shouldn't use the system cache.

However, when I test with my Alluxio:

  1. When we read 10GB data for the first time, there will be 2 copies: 10GB increase in alluxio, and 10GB increase in system cache, so in total 20GB mem used (this is unexpected since by default direct io option is set and it shouldn't use system cache)
  2. When we read these 10GB data again, both alluxio and system cache don't get increased (this is expected)
  3. Before read these 10GB data again, I use sync; echo 3 > /proc/sys/vm/drop_caches to clear the system cache, and then read these 10GB data again, the system cache does not get increased (this is expected, but just a bit weird because of 1)

So I wonder if the case 1 is a normal case (if so, then maybe we should mention this in the document)? Or if I did something wrong with the configurations/settings/environments/etc?

Thanks.

apc999 commented 3 years ago

@DdMad thanks for raising the question.

Are you testing Alluxio POSIX interface? Did you enable JNI FUSE or still using the default JNR FUSE (e.g., did you set alluxio.fuse.jnifuse.enabled=true?). With JNI Fuse enabled, system cache is used; with default JNR Fuse, direct_io is enforced.

DdMad commented 3 years ago

@apc999 thanks for replying. Yes I'm testing Alluxio POSIX interface, and I didn't set alluxio.fuse.jnifuse.enabled=true, so I guess JNR Fuse would be used by default? However, you mentioned that, _"with default JNR Fuse, directio is enforced", so does it mean it shouldn't use system cache?

BTW, I checked (List of Configuration Properties)[https://docs.alluxio.io/os/user/stable/en/reference/Properties-List.html] and I didn't see this config property alluxio.fuse.jnifuse.enabled, what does this config property do?

Thanks a lot!

apc999 commented 3 years ago

@DdMad you can checkout latest edge release 2.4.1-2 which has JNI Fuse boundled. Checkout the docs: https://docs.alluxio.io/os/user/edge/en/api/POSIX-API.html

Meanwhile, we are releasing 2.4.1-3 shortly, by which the JNI Fuse is even easier to use

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in two weeks if no further activity occurs. Thank you for your contributions.

jja725 commented 1 year ago

Will close it for now, feel free to reopen it and contact us if this is still valid.