New thread is spawned for every I/O request

When trying to run a sequential I/O with async-std, it seems that new thread is spawned for every I/O request. Here's strace log for a test workload, which shows that a thread is spawned for every request.

79669 00:41:46.567723 clone( <unfinished ...>
79669 00:41:46.567776 <... clone resumed> child_stack=0x7f2e6d980b70, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f2e6d9819d0, tls=0x7f2e6d981700, child_tidptr=0x7f2e6d9819d0) = 79670
79669 00:41:46.567800 read(4,  <unfinished ...>
79670 00:41:46.568086 getrandom("\xa5\x37\x5a\x6e\x17\x92\xc9\xfe\xe4\x95\x04\x86\x48\xf0\xc3\x2a\x76\xaa\x6f\x41\x6d\x78\xbb\x13\x30\xdd\x10\x16\x33\x4f\x48\x2f", 32, GRND_NONBLOCK) = 32
79669 00:41:46.569861 <... read resumed> "AndroidManifest.xml\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4194304) = 4194304
79669 00:41:46.570068 exit(0)           = ?
79670 00:41:46.571880 clone(child_stack=0x7f2e6e382b70, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f2e6e3839d0, tls=0x7f2e6e383700, child_tidptr=0x7f2e6e3839d0) = 79671
79670 00:41:46.571938 read(4,  <unfinished ...>
79671 00:41:46.572264 getrandom("\xbe\xbb\x85\xd2\x92\x95\xa5\xe0\x5e\x4a\xc4\xf7\xb0\x9a\xb0\x87\xa0\x5b\xfe\x4c\xc9\x47\x94\xb9\x25\xb0\x2c\x48\xc2\x87\x3b\x66", 32, GRND_NONBLOCK) = 32
79670 00:41:46.573016 <... read resumed> "\6\20\0u555'\252\25210\0u---#\242\"U\20\4\204\226\256\r\226(P\0\201"..., 4194304) = 4194304
79670 00:41:46.573279 exit(0)           = ?
79671 00:41:46.574703 clone(child_stack=0x7f2e6d980b70, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f2e6d9819d0, tls=0x7f2e6d981700, child_tidptr=0x7f2e6d9819d0) = 79672
79671 00:41:46.574773 read(4,  <unfinished ...>
79672 00:41:46.575185 getrandom("\xcf\x3b\x90\x37\x9e\x51\x44\xe8\xa4\x87\xe1\x69\x6f\x21\xb3\xb5\xf9\x13\x2a\xfa\xaa\x9a\xdd\x7c\x3d\x52\x6d\xda\xb7\x35\x21\xff", 32, GRND_NONBLOCK) = 32
79671 00:41:46.575597 <... read resumed> "\213d\272\\\346\272\257\247\346i\241\272\241\314\356J\234O\375\355\316\350<%X\327\316\316\275k77"..., 4194304) = 4194304
79671 00:41:46.575822 exit(0)           = ?
79672 00:41:46.577128 clone(child_stack=0x7f2e6e382b70, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f2e6e3839d0, tls=0x7f2e6e383700, child_tidptr=0x7f2e6e3839d0) = 79673
79672 00:41:46.577190 read(4,  <unfinished ...>
79673 00:41:46.577577 getrandom("\xf5\x32\xc3\x91\xd2\x5d\x82\xee\xe0\x70\x90\x62\xf7\x55\xdf\x06\x52\x9a\x87\x3b\x84\x92\xeb\x12\xbc\x49\x92\x8c\x8b\xba\x4e\x11", 32, GRND_NONBLOCK) = 32
79672 00:41:46.578301 <... read resumed> "\267\377\7\6\307\34\4\271\2+\233\222\1\0*\226\253\0\10<\4\22\34$S\3#\312\t}\263\4"..., 4194304) = 4194304
79672 00:41:46.578501 exit(0)           = ?
79673 00:41:46.579951 clone(child_stack=0x7f2e6d980b70, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f2e6d9819d0, tls=0x7f2e6d981700, child_tidptr=0x7f2e6d9819d0) = 79674
79673 00:41:46.580029 read(4,  <unfinished ...>
79674 00:41:46.580354 getrandom("\x26\xda\xb7\xd8\xd6\xa8\x94\x81\x7e\xd5\xf2\xba\xa8\x0c\xfe\x97\xd0\x9b\xec\x99\x82\x28\x49\x69\x98\xd4\x3a\x05\x82\xa4\xaf\x92", 32, GRND_NONBLOCK) = 32
79673 00:41:46.580667 <... read resumed> "&\17(\0\377\311\262\t 6Mcx\214\237\257\276\273\305\30Z\262\270\275\301\315\0\10\365\10\1\23"..., 4194304) = 4194304
79673 00:41:46.580907 exit(0)           = ?
79674 00:41:46.582260 clone(child_stack=0x7f2e6e382b70, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f2e6e3839d0, tls=0x7f2e6e383700, child_tidptr=0x7f2e6e3839d0) = 79675
79674 00:41:46.582322 read(4,  <unfinished ...>
79675 00:41:46.582647 getrandom("\xf5\x1e\xc5\xca\xab\x11\x8c\x8a\x4a\x90\x3f\x93\x11\xbc\x28\x93\xd0\xbb\x29\xa1\x97\x08\xba\xe6\xa9\x9f\xd3\x4e\x2d\xcb\x00\x6a", 32, GRND_NONBLOCK) = 32
79674 00:41:46.583319 <... read resumed> "_\233\5_\233\25\363b\377\0\0\0\0\0\0\0_\233\5_\233\25\363b\377\0\0\0\0\0\0\0"..., 4194304) = 4194304
79674 00:41:46.583526 exit(0)           = ?
79675 00:41:46.584923 clone(child_stack=0x7f2e6d980b70, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f2e6d9819d0, tls=0x7f2e6d981700, child_tidptr=0x7f2e6d9819d0) = 79676
79675 00:41:46.584994 read(4,  <unfinished ...>
79676 00:41:46.585351 getrandom("\x07\x26\xde\x5d\x98\xf7\x6d\xec\xa9\xb1\x58\xc9\x9d\xc5\x9a\x05\xc4\xaa\x71\x88\xeb\x8c\xcb\x13\x81\x8d\xf0\xce\x05\x91\x09\xa6", 32, GRND_NONBLOCK) = 32
79675 00:41:46.586010 <... read resumed> "\331\331\241\"\214\316\10\214\340\340\250\2\0\0\0\0\340\340\250\2\0\0\0\0\340\340\250\2\0\0\0\0"..., 4194304) = 4194304
79675 00:41:46.586205 exit(0)           = ?

The workload is recorded with

strace -o strace.log -ftt ./target/release/run

and filtered with

grep -vP '(futex|sched|map|sig|prote|prctl|robust_|madvise|exited)' strace.log

Thanks for report. Can you teach us used async-std version and wrote code? @yjh0502

@k-nasa Here's a sample code to reproduce the problem

use async_std::prelude::*;

fn main() {
    async_std::task::block_on(async {
        let mut f1 = async_std::fs::File::open("/dev/zero")
            .await
            .expect("File::open");
        let mut f2 = async_std::fs::File::create("/dev/null")
            .await
            .expect("File::create");

        let mut buf = [0u8; 1];
        for _ in 0..10000 {
            f1.read(&mut buf).await.expect("read");
            f2.write(&buf).await.expect("write");
        }
        ()
    });
}

I'm using latest async-std, 1.5.0, and here's strace stats.

$ strace -cf ./target/debug/async-std-test 2>&1 | grep -v '^strace: Process'
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 55.80    2.085741          28     72686      3836 futex
 10.52    0.393280           6     57470           sigaltstack
  6.68    0.249663          13     19159           munmap
  5.64    0.210648          10     19154           madvise
  4.18    0.156334           8     19188           mmap
  3.75    0.140303           7     19157           sched_getaffinity
  3.45    0.128948           6     19156           prctl
  3.14    0.117421           6     19156           clone
  2.52    0.094104           4     19157           set_robust_list
  1.63    0.060762           6     10008           read
  1.55    0.057823           5     10000           write
  1.14    0.042570           7      5466           sched_yield
  0.00    0.000039           2        17           mprotect
  0.00    0.000000           0         8           close
  0.00    0.000000           0         6           fstat
  0.00    0.000000           0         3           brk
  0.00    0.000000           0         5           rt_sigaction
  0.00    0.000000           0         1           rt_sigprocmask
  0.00    0.000000           0         8           pread64
  0.00    0.000000           0         1         1 access
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         1           fcntl
  0.00    0.000000           0         2         1 arch_prctl
  0.00    0.000000           0         1           set_tid_address
  0.00    0.000000           0         8           openat
  0.00    0.000000           0         2           prlimit64
------ ----------- ----------- --------- --------- ----------------
100.00    3.737636                289821      3838 total

$ strace -cf ./target/release/async-std-test 2>&1 | grep -v '^strace: Process'
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 55.74    2.518723          34     72651       622 futex
 10.04    0.453825           7     58040           sigaltstack
  7.11    0.321096          16     19345           madvise
  6.23    0.281487          14     19352           munmap
  4.68    0.211388          10     19384           mmap
  3.36    0.151727           7     19346           clone
  3.26    0.147177           7     19347           sched_getaffinity
  2.89    0.130757           6     19346           prctl
  2.72    0.122969           6     19347           set_robust_list
  1.51    0.068417           6     10008           read
  1.42    0.064351           6     10000           write
  1.04    0.046779           6      7577           sched_yield
  0.00    0.000083           3        23           mprotect
  0.00    0.000012           1         8           close
  0.00    0.000000           0         6           fstat
  0.00    0.000000           0         3           brk
  0.00    0.000000           0         5           rt_sigaction
  0.00    0.000000           0         1           rt_sigprocmask
  0.00    0.000000           0         8           pread64
  0.00    0.000000           0         1         1 access
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         1           fcntl
  0.00    0.000000           0         2         1 arch_prctl
  0.00    0.000000           0         1           set_tid_address
  0.00    0.000000           0         8           openat
  0.00    0.000000           0         2           prlimit64
------ ----------- ----------- --------- --------- ----------------
100.00    4.518791                293813       624 total

It read/write about 10k times (as expected), and around 19k threads are created/destroyed during the process.

Edit: I may have misread the code to perform a lot more open/create than it actually does. Note that it's actually some spawn_blocking() calls within read and write that appear to be causing issues.

Here's what I think is happening:

open and create both are implimented via spawn_blocking(), which is intended to use a thread pool.
The internal thread pool impl keeps at least one sleeping thread running at all times.
when a thread-pool thread is started, it blocks waiting for work on a queue
- if the blocking times out (it does every second), it checks if we're the last sleeping thread
  - if we're the last sleeping thread, we continue running.
  - if there are other sleeping threads, the thread is terminated
- after obtaining a task, the thread checks if there are any sleeping threads (ie: it is no longer sleeping because it has a task, so it checks that there's another).
  - if there are no other sleeping threads then a new one is spawned
- task is then run
- another check for sleeping threads occurs: if there are others, than the thread running the task exits.

Putting this together with the code sample:

on each of open and create:
- the 1 sleeping thread is used for computation
- another thread is spawned so that there is at least 1 sleeping thread
- after computation, the computing thread exits because the sleeping thread it created exists.
so we see approximately 2x iterations in thread creation/destruction (1 thread spawned + destroyed for each of open and create)

I'm not sure what the best way to resolve this is. Plausible that some different approach to the thread pool could be taken. It's not really operating as a pool right now (zero thread re-use with the code given).

I don't know how the internals are working, but shouldn't there be a policy that allows to spawn a predefined number of threads say N, which are waiting for tasks to be available. Then when the tasks are available, schedule those M tasks across the N available threads?

I've looked back at the code again and noticed that it performs 1 open and 1 create (not many). This means that the explanation I posted previously is probably wrong (I added a note at the start about that error).

I don't know how the internals are working, but shouldn't there be a policy that allows to spawn a predefined number of threads say N, which are waiting for tasks to be available. Then when the tasks are available, schedule those M tasks across the N available threads?

For the particular spawn_blocking::POOL, there are no limits. Note that spawn_blocking is intended for use with blocking tasks (open and create are syscalls that block and don't offer a non-blocking interface). In other words, one can only ever do 1:1 scheduling.

~~Note that this shouldn't apply to read/write: these have non-blocking interfaces.~~ Edit: file-io is basically blocking on linux even if using NONBLOCK flags.

Ok, so I've dug a bit further:

It appears that we're hitting the spawn_blocking() call within write(). Here's a partial call stack:

#0  async_std::task::spawn_blocking::spawn_blocking (f=...)
    at /home/cody/.cargo/registry/src/github.com-1ecc6299db9ec823/async-std-1.5.0/src/task/spawn_blocking.rs:45
#1  0x00005555555a770b in async_std::fs::file::LockGuard<async_std::fs::file::State>::poll_drain (self=..., cx=0x7fffffffd1c0)
    at /home/cody/.cargo/registry/src/github.com-1ecc6299db9ec823/async-std-1.5.0/src/fs/file.rs:815
#2  0x00005555555a6c9f in async_std::fs::file::LockGuard<async_std::fs::file::State>::poll_write (self=...,
    cx=0x7fffffffd1c0, buf=...)
    at /home/cody/.cargo/registry/src/github.com-1ecc6299db9ec823/async-std-1.5.0/src/fs/file.rs:796
#3  0x00005555555a4895 in <&async_std::fs::file::File as futures_io::if_std::AsyncWrite>::poll_write (self=...,
    cx=0x7fffffffd1c0, buf=...)
    at /home/cody/.cargo/registry/src/github.com-1ecc6299db9ec823/async-std-1.5.0/src/fs/file.rs:374
#4  0x00005555555a46e9 in <async_std::fs::file::File as futures_io::if_std::AsyncWrite>::poll_write (self=...,
    cx=0x7fffffffd1c0, buf=...)
    at /home/cody/.cargo/registry/src/github.com-1ecc6299db9ec823/async-std-1.5.0/src/fs/file.rs:355
#5  0x0000555555573bbd in <async_std::io::write::write::WriteFuture<T> as core::future::future::Future>::poll (self=...,
    cx=0x7fffffffd1c0) at /home/cody/.cargo/registry/src/github.com-1ecc6299db9ec823/async-std-1.5.0/src/io/write/write.rs:19

There's also one in read() (again, partial backtrace):

#0  async_std::task::spawn_blocking::spawn_blocking (f=...)
    at /home/cody/.cargo/registry/src/github.com-1ecc6299db9ec823/async-std-1.5.0/src/task/spawn_blocking.rs:45
#1  0x00005555555a5d8d in async_std::fs::file::LockGuard<async_std::fs::file::State>::poll_read (self=..., cx=0x7fffffffd1c0,
    buf=...) at /home/cody/.cargo/registry/src/github.com-1ecc6299db9ec823/async-std-1.5.0/src/fs/file.rs:703
#2  0x00005555555a4625 in <&async_std::fs::file::File as futures_io::if_std::AsyncRead>::poll_read (self=...,
    cx=0x7fffffffd1c0, buf=...)
    at /home/cody/.cargo/registry/src/github.com-1ecc6299db9ec823/async-std-1.5.0/src/fs/file.rs:345
#3  0x00005555555a44f9 in <async_std::fs::file::File as futures_io::if_std::AsyncRead>::poll_read (self=...,
    cx=0x7fffffffd1c0, buf=...)
    at /home/cody/.cargo/registry/src/github.com-1ecc6299db9ec823/async-std-1.5.0/src/fs/file.rs:334
#4  0x0000555555570e6f in <&mut T as futures_io::if_std::AsyncRead>::poll_read (self=..., cx=0x7fffffffd1c0, buf=...)
    at /home/cody/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-io-0.3.4/src/lib.rs:326
#5  0x000055555556ab6d in <async_std::io::read::read::ReadFuture<T> as core::future::future::Future>::poll (self=...,
    cx=0x7fffffffd1c0) at /home/cody/.cargo/registry/src/github.com-1ecc6299db9ec823/async-std-1.5.0/src/io/read/read.rs:19

So my previous incorrect comment is still somewhat informative here.

@jmesmon

Edit: file-io is basically blocking on linux even if using NONBLOCK flags.

Correct me if I am wrong, but this should hold true for regular files on all *nix systems, since it is assumed that a local file is always accessible and disk reads/writes are not slow enough to block. But the same should not be true for "special" character special devices such as /dev/zero and /dev/null. There is no reason for kernel to block reads and writes to these files, even on linux.

On side note since you guys are using mio, my guess would be it is using epoll internally on linux, in which case it is important to note that epoll doesn't work on regular files. I also highly doubt that it would work on files like /dev/zero, which always has something to read.

@adityashah1212 sure, that sounds right. Internally, async-std flows all File io the same way: calling read() (or write())within a spawn_blocking(), in effect spawning a thread for each of the reads/writes of normal files that occur in a user's program.

As far as I can tell, there is no special casing for non-normal (ie: device) files.

@jmesmon then wouldn't it better to have two sets of API's in two different modules, one performing blocking and other performing non-blocking File IO operations. Let the developer choose which ones s/he wants to use?

Edit: Actually I think we need 3 sets of API's, two as described above and a third one that uses about async API internally and also registers it to native async handler like epoll in Linux and kqueue in BSD variants and Darwin. (I don't know for Windows)

This one is important for case where we are interacting with some custom device node under /dev which might block until data is available, like say a Linux UIO driver.

Yes, there are non-normal-file use cases where one would want to wait on the fd instead of spawning a separate thread for each operation. I don't know what async-std's thoughts on where those should live are.

As far as blocking operations, I believe one would just use the std::file stuff at that point instead of async_std::file.

async-rs / async-std

New thread is spawned for every I/O request #731