We can avoid the possible FD limits with parallel read.
We are not loading the file from disk to memory until needed
It's "supposed" to be faster
Is mmap really better?
~I try to simply memcpy the mapped memory to another buffer, the OS seems to just skip loading the real data, I didn't find a way to easily ensure we load the data.~ It was a bug in my test program, now, I saw it's just the same as multi-thread fread for both cached or cold files.
From the S3 tests, I am seeing slightly performance improvement with mmap and just fread.
- From tracing of the s3 tests, I saw the improvement from 7.5 secs to 6.8 secs, where our IO threads are doing intensive works. But, I would assume it's not really related to the difference between mmap and fread, as they are very likely not the bottle neck of the process.
How will mmap affect our memory usage?
- Needs to test and tracing. `mmap` supports to load the data on needed and we control how many data to read into memory. It "SHOULD" have no affect on our memory usage. However, I am not sure, I'll run some tracing and see the memory usage to make sure.
- Used `top` to track the memory usage, found that OS will load the data into memory and keep it until we unmap the memory. Fixed the memory issue by only map the needed part. The latest change has no effects on memory usage now.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
Why mmap?
Is mmap really better?
fread Run:1 Secs:12.924 Gb/s:19.9 Mb/s:19940.2 GiB/s:2.3 MiB/s:2377.1 Run:2 Secs:9.986 Gb/s:25.8 Mb/s:25806.7 GiB/s:3.0 MiB/s:3076.4 Run:3 Secs:9.382 Gb/s:27.5 Mb/s:27466.3 GiB/s:3.2 MiB/s:3274.2 Run:4 Secs:10.024 Gb/s:25.7 Mb/s:25707.0 GiB/s:3.0 MiB/s:3064.5 Run:5 Secs:8.134 Gb/s:31.7 Mb/s:31683.2 GiB/s:3.7 MiB/s:3776.9 Run:6 Secs:9.643 Gb/s:26.7 Mb/s:26722.7 GiB/s:3.1 MiB/s:3185.6 Run:7 Secs:9.860 Gb/s:26.1 Mb/s:26136.3 GiB/s:3.0 MiB/s:3115.7 Run:8 Secs:9.008 Gb/s:28.6 Mb/s:28606.2 GiB/s:3.3 MiB/s:3410.1 Run:9 Secs:9.545 Gb/s:27.0 Mb/s:26998.6 GiB/s:3.1 MiB/s:3218.5 Run:10 Secs:8.926 Gb/s:28.9 Mb/s:28869.0 GiB/s:3.4 MiB/s:3441.5