Open yjh0502 opened 4 years ago
Thanks for report. Can you teach us used async-std version and wrote code? @yjh0502
@k-nasa Here's a sample code to reproduce the problem
use async_std::prelude::*;
fn main() {
async_std::task::block_on(async {
let mut f1 = async_std::fs::File::open("/dev/zero")
.await
.expect("File::open");
let mut f2 = async_std::fs::File::create("/dev/null")
.await
.expect("File::create");
let mut buf = [0u8; 1];
for _ in 0..10000 {
f1.read(&mut buf).await.expect("read");
f2.write(&buf).await.expect("write");
}
()
});
}
I'm using latest async-std
, 1.5.0
, and here's strace stats.
$ strace -cf ./target/debug/async-std-test 2>&1 | grep -v '^strace: Process'
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
55.80 2.085741 28 72686 3836 futex
10.52 0.393280 6 57470 sigaltstack
6.68 0.249663 13 19159 munmap
5.64 0.210648 10 19154 madvise
4.18 0.156334 8 19188 mmap
3.75 0.140303 7 19157 sched_getaffinity
3.45 0.128948 6 19156 prctl
3.14 0.117421 6 19156 clone
2.52 0.094104 4 19157 set_robust_list
1.63 0.060762 6 10008 read
1.55 0.057823 5 10000 write
1.14 0.042570 7 5466 sched_yield
0.00 0.000039 2 17 mprotect
0.00 0.000000 0 8 close
0.00 0.000000 0 6 fstat
0.00 0.000000 0 3 brk
0.00 0.000000 0 5 rt_sigaction
0.00 0.000000 0 1 rt_sigprocmask
0.00 0.000000 0 8 pread64
0.00 0.000000 0 1 1 access
0.00 0.000000 0 1 execve
0.00 0.000000 0 1 fcntl
0.00 0.000000 0 2 1 arch_prctl
0.00 0.000000 0 1 set_tid_address
0.00 0.000000 0 8 openat
0.00 0.000000 0 2 prlimit64
------ ----------- ----------- --------- --------- ----------------
100.00 3.737636 289821 3838 total
$ strace -cf ./target/release/async-std-test 2>&1 | grep -v '^strace: Process'
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
55.74 2.518723 34 72651 622 futex
10.04 0.453825 7 58040 sigaltstack
7.11 0.321096 16 19345 madvise
6.23 0.281487 14 19352 munmap
4.68 0.211388 10 19384 mmap
3.36 0.151727 7 19346 clone
3.26 0.147177 7 19347 sched_getaffinity
2.89 0.130757 6 19346 prctl
2.72 0.122969 6 19347 set_robust_list
1.51 0.068417 6 10008 read
1.42 0.064351 6 10000 write
1.04 0.046779 6 7577 sched_yield
0.00 0.000083 3 23 mprotect
0.00 0.000012 1 8 close
0.00 0.000000 0 6 fstat
0.00 0.000000 0 3 brk
0.00 0.000000 0 5 rt_sigaction
0.00 0.000000 0 1 rt_sigprocmask
0.00 0.000000 0 8 pread64
0.00 0.000000 0 1 1 access
0.00 0.000000 0 1 execve
0.00 0.000000 0 1 fcntl
0.00 0.000000 0 2 1 arch_prctl
0.00 0.000000 0 1 set_tid_address
0.00 0.000000 0 8 openat
0.00 0.000000 0 2 prlimit64
------ ----------- ----------- --------- --------- ----------------
100.00 4.518791 293813 624 total
It read/write about 10k times (as expected), and around 19k threads are created/destroyed during the process.
Edit: I may have misread the code to perform a lot more open
/create
than it actually does. Note that it's actually some spawn_blocking()
calls within read
and write
that appear to be causing issues.
Here's what I think is happening:
open
and create
both are implimented via spawn_blocking()
, which is intended to use a thread pool.
The internal thread pool impl keeps at least one sleeping thread running at all times.
when a thread-pool thread is started, it blocks waiting for work on a queue
if the blocking times out (it does every second), it checks if we're the last sleeping thread
after obtaining a task, the thread checks if there are any sleeping threads (ie: it is no longer sleeping because it has a task, so it checks that there's another).
task is then run
another check for sleeping threads occurs: if there are others, than the thread running the task exits.
Putting this together with the code sample:
open
and create
:
open
and create
)I'm not sure what the best way to resolve this is. Plausible that some different approach to the thread pool could be taken. It's not really operating as a pool right now (zero thread re-use with the code given).
I don't know how the internals are working, but shouldn't there be a policy that allows to spawn a predefined number of threads say N, which are waiting for tasks to be available. Then when the tasks are available, schedule those M tasks across the N available threads?
I've looked back at the code again and noticed that it performs 1 open
and 1 create
(not many). This means that the explanation I posted previously is probably wrong (I added a note at the start about that error).
I don't know how the internals are working, but shouldn't there be a policy that allows to spawn a predefined number of threads say N, which are waiting for tasks to be available. Then when the tasks are available, schedule those M tasks across the N available threads?
For the particular spawn_blocking::POOL
, there are no limits. Note that spawn_blocking
is intended for use with blocking tasks (open
and create
are syscalls that block and don't offer a non-blocking interface). In other words, one can only ever do 1:1 scheduling.
Note that this shouldn't apply to
Edit: file-io is basically blocking on linux even if using read
/write
: these have non-blocking interfaces.NONBLOCK
flags.
Ok, so I've dug a bit further:
It appears that we're hitting the spawn_blocking()
call within write()
. Here's a partial call stack:
#0 async_std::task::spawn_blocking::spawn_blocking (f=...)
at /home/cody/.cargo/registry/src/github.com-1ecc6299db9ec823/async-std-1.5.0/src/task/spawn_blocking.rs:45
#1 0x00005555555a770b in async_std::fs::file::LockGuard<async_std::fs::file::State>::poll_drain (self=..., cx=0x7fffffffd1c0)
at /home/cody/.cargo/registry/src/github.com-1ecc6299db9ec823/async-std-1.5.0/src/fs/file.rs:815
#2 0x00005555555a6c9f in async_std::fs::file::LockGuard<async_std::fs::file::State>::poll_write (self=...,
cx=0x7fffffffd1c0, buf=...)
at /home/cody/.cargo/registry/src/github.com-1ecc6299db9ec823/async-std-1.5.0/src/fs/file.rs:796
#3 0x00005555555a4895 in <&async_std::fs::file::File as futures_io::if_std::AsyncWrite>::poll_write (self=...,
cx=0x7fffffffd1c0, buf=...)
at /home/cody/.cargo/registry/src/github.com-1ecc6299db9ec823/async-std-1.5.0/src/fs/file.rs:374
#4 0x00005555555a46e9 in <async_std::fs::file::File as futures_io::if_std::AsyncWrite>::poll_write (self=...,
cx=0x7fffffffd1c0, buf=...)
at /home/cody/.cargo/registry/src/github.com-1ecc6299db9ec823/async-std-1.5.0/src/fs/file.rs:355
#5 0x0000555555573bbd in <async_std::io::write::write::WriteFuture<T> as core::future::future::Future>::poll (self=...,
cx=0x7fffffffd1c0) at /home/cody/.cargo/registry/src/github.com-1ecc6299db9ec823/async-std-1.5.0/src/io/write/write.rs:19
There's also one in read()
(again, partial backtrace):
#0 async_std::task::spawn_blocking::spawn_blocking (f=...)
at /home/cody/.cargo/registry/src/github.com-1ecc6299db9ec823/async-std-1.5.0/src/task/spawn_blocking.rs:45
#1 0x00005555555a5d8d in async_std::fs::file::LockGuard<async_std::fs::file::State>::poll_read (self=..., cx=0x7fffffffd1c0,
buf=...) at /home/cody/.cargo/registry/src/github.com-1ecc6299db9ec823/async-std-1.5.0/src/fs/file.rs:703
#2 0x00005555555a4625 in <&async_std::fs::file::File as futures_io::if_std::AsyncRead>::poll_read (self=...,
cx=0x7fffffffd1c0, buf=...)
at /home/cody/.cargo/registry/src/github.com-1ecc6299db9ec823/async-std-1.5.0/src/fs/file.rs:345
#3 0x00005555555a44f9 in <async_std::fs::file::File as futures_io::if_std::AsyncRead>::poll_read (self=...,
cx=0x7fffffffd1c0, buf=...)
at /home/cody/.cargo/registry/src/github.com-1ecc6299db9ec823/async-std-1.5.0/src/fs/file.rs:334
#4 0x0000555555570e6f in <&mut T as futures_io::if_std::AsyncRead>::poll_read (self=..., cx=0x7fffffffd1c0, buf=...)
at /home/cody/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-io-0.3.4/src/lib.rs:326
#5 0x000055555556ab6d in <async_std::io::read::read::ReadFuture<T> as core::future::future::Future>::poll (self=...,
cx=0x7fffffffd1c0) at /home/cody/.cargo/registry/src/github.com-1ecc6299db9ec823/async-std-1.5.0/src/io/read/read.rs:19
So my previous incorrect comment is still somewhat informative here.
@jmesmon
Edit: file-io is basically blocking on linux even if using NONBLOCK flags.
Correct me if I am wrong, but this should hold true for regular files on all *nix systems, since it is assumed that a local file is always accessible and disk reads/writes are not slow enough to block. But the same should not be true for "special" character special devices such as /dev/zero and /dev/null. There is no reason for kernel to block reads and writes to these files, even on linux.
On side note since you guys are using mio, my guess would be it is using epoll internally on linux, in which case it is important to note that epoll doesn't work on regular files. I also highly doubt that it would work on files like /dev/zero, which always has something to read.
@adityashah1212 sure, that sounds right. Internally, async-std
flows all File
io the same way: calling read()
(or write()
)within a spawn_blocking()
, in effect spawning a thread for each of the reads/writes of normal files that occur in a user's program.
As far as I can tell, there is no special casing for non-normal (ie: device) files.
@jmesmon then wouldn't it better to have two sets of API's in two different modules, one performing blocking and other performing non-blocking File IO operations. Let the developer choose which ones s/he wants to use?
Edit: Actually I think we need 3 sets of API's, two as described above and a third one that uses about async API internally and also registers it to native async handler like epoll in Linux and kqueue in BSD variants and Darwin. (I don't know for Windows)
This one is important for case where we are interacting with some custom device node under /dev which might block until data is available, like say a Linux UIO driver.
Yes, there are non-normal-file use cases where one would want to wait on the fd instead of spawning a separate thread for each operation. I don't know what async-std's thoughts on where those should live are.
As far as blocking operations, I believe one would just use the std::file
stuff at that point instead of async_std::file
.
When trying to run a sequential I/O with
async-std
, it seems that new thread is spawned for every I/O request. Here's strace log for a test workload, which shows that a thread is spawned for every request.The workload is recorded with
and filtered with