Open VladTheJunior opened 7 months ago
Could you please take a profile of your program to see where it's looping?
How can I do that?
Is it enough?
7.79% tokio-runtime-w test_agent [.] memcpy
7.77% tokio-runtime-w test_agent [.] <russh::channels::io::tx::ChannelTx<S> as tokio::io::async_write::AsyncWrite>::poll_write
6.62% tokio-runtime-w test_agent [.] tokio::runtime::scheduler::multi_thread::worker::Context::run
5.77% tokio-runtime-w test_agent [.] tokio::runtime::scheduler::multi_thread::worker::<impl tokio::runtime::scheduler::multi_thread::ha 5.11% tokio-runtime-w test_agent [.] tokio::sync::batch_semaphore::Semaphore::add_permits_locked
4.53% tokio-runtime-w test_agent [.] tokio::sync::mutex::Mutex<T>::lock_owned::{{closure}}
4.52% tokio-runtime-w test_agent [.] <tokio::sync::batch_semaphore::Acquire as core::future::future::Future>::poll
3.76% tokio-runtime-w test_agent [.] __libc_free
3.42% tokio-runtime-w test_agent [.] tokio::runtime::context::with_scheduler
3.16% tokio-runtime-w test_agent [.] __unlock
2.95% tokio-runtime-w test_agent [.] <tokio::io::util::copy::Copy<R,W> as core::future::future::Future>::poll
2.79% tokio-runtime-w test_agent [.] tokio::sync::batch_semaphore::Semaphore::release
2.78% tokio-runtime-w test_agent [.] __lock
2.77% tokio-runtime-w test_agent [.] tokio::runtime::task::state::State::transition_to_running
2.46% tokio-runtime-w test_agent [.] tokio::runtime::scheduler::multi_thread::worker::Context::run_task
2.45% tokio-runtime-w test_agent [.] tokio::runtime::task::state::State::transition_to_idle
2.14% tokio-runtime-w test_agent [.] tokio::runtime::task::waker::wake_by_ref
2.04% tokio-runtime-w test_agent [.] tokio::runtime::task::state::State::ref_dec
1.81% tokio-runtime-w test_agent [.] agent::ssh::SshSession::execute::{{closure}}
1.80% tokio-runtime-w test_agent [.] sccp
1.77% tokio-runtime-w test_agent [.] __libc_malloc_impl
1.70% tokio-runtime-w test_agent [.] tokio::runtime::task::harness::Harness<T,S>::poll
1.59% tokio-runtime-w test_agent [.] tokio::runtime::context::scoped::Scoped<T>::with
1.53% tokio-runtime-w test_agent [.] alloc_slot
1.51% tokio-runtime-w test_agent [.] enframe
1.42% tokio-runtime-w test_agent [.] get_meta
1.30% tokio-runtime-w test_agent [.] tokio::runtime::task::core::Core<T,S>::poll
There is also additional context of this problem. Probably root reason is target device. I trying to execute command which sends ~200KB of data. But for unknown reason it recieves just 40KB and then I met 100% cpu usage on my side and stacked process on device with command (sh -c /cat > /some/file). BUT when I restarted this device problem is gone. So I can't catch this bug now, but behavior of 100% cpu is bad and can break my app again in the next time (it happened rarely some times before, but I could detect it only yesterday since I have thousands of devices where this command is executing). It should generate some error instead or at least react to timeout
I'm facing the same issue - 100% CPU usage (single core) when awaiting russh::Channel::data
.
In comparison, ssh user@10.0.0.123 "cat > /dev/null" < /dev/urandom
consumes barely any CPU.
Minimal example:
use async_trait::async_trait;
use russh::client;
use russh::client::Config;
use russh_keys::key;
use std::sync::Arc;
#[tokio::main]
async fn main() -> anyhow::Result<()> {
let addrs = ("10.0.0.123", 22);
let mut session = client::connect(Arc::new(Config::default()), addrs, Client).await?;
let auth_ok = session.authenticate_password("user", "pass").await?;
if !auth_ok {
anyhow::bail!("auth failed");
}
let channel = session.channel_open_session().await?;
channel.exec(true, "cat > /dev/null").await?;
let infinite_data = tokio::io::repeat(b'A');
println!("sending data");
channel.data(infinite_data).await?;
unreachable!();
}
struct Client;
#[async_trait]
impl client::Handler for Client {
type Error = russh::Error;
async fn check_server_key(
&mut self,
_server_public_key: &key::PublicKey,
) -> Result<bool, Self::Error> {
Ok(true)
}
}
(I'm using russh 0.44.0-beta.2
)
Flamegraph: flamegraph.svg.gz
(russh
is highlighted)
I generated the flamegraph using cargo flamegraph --release
. I had to let it run for a few minutes to make the data
method more apparent in the perf data.
I also turned on debuginfo in the release profile:
# Cargo.toml
[profile.release]
debug = true
tokio::io::repeat
is really not the best choice for benchmarking as it emits bytes one by one.
Here's a more realistic example:
$ head -c 1048576000 /dev/urandom | time ssh eugene@localhost "cat > /dev/null"
ssh eugene@localhost "cat > /dev/null" 5.02s user 0.68s system 91% cpu 6.243 total
$ time cargo run --release --example test3
Finished release [optimized] target(s) in 0.15s
Running `target/release/examples/test3`
sent 1048576000 bytes in 6.553448791 seconds
cargo run --release --example test3 10.98s user 1.20s system 162% cpu 7.496 total
For me, debug = true
incurs an additional 25% performance penalty
Code:
use async_trait::async_trait;
use russh::client;
use russh::client::Config;
use russh_keys::key;
use std::sync::Arc;
use tokio::time::Instant;
#[tokio::main]
async fn main() -> anyhow::Result<()> {
let addrs = ("localhost", 22);
let mut session = client::connect(Arc::new(Config::default()), addrs, Client).await?;
let auth_ok = session.authenticate_password("eugene", "").await?;
if !auth_ok {
anyhow::bail!("auth failed");
}
let channel = session.channel_open_session().await?;
channel.exec(true, "cat > /dev/null").await?;
let mut bytes_sent = 0;
let block = [0; 1024 * 1024];
let t = Instant::now();
for _ in 0..1000 {
channel.data(&block[..]).await?;
bytes_sent += block.len();
}
let elapsed = t.elapsed();
let elapsed = elapsed.as_secs() as f64;
println!("sent {} bytes in {} seconds", bytes_sent, elapsed);
println!(
"{} MB per second",
(bytes_sent as f64 / elapsed / 1_000_000.0) as u64
);
Ok(())
}
struct Client;
#[async_trait]
impl client::Handler for Client {
type Error = russh::Error;
async fn check_server_key(
&mut self,
_server_public_key: &key::PublicKey,
) -> Result<bool, Self::Error> {
Ok(true)
}
}
(also, as most CPU is spent in encryption, different cipher selection can account for huge performance deviations, e.g. 50% between Chacha20-Poly1305 and AES)
I have same problem when the connection is lost while sending the large amount of data.
tokio::io::repeat
is really not the best choice for benchmarking as it emits bytes one by one.
I replaced tokio::io::repeat
in my original example with just channel.data(vec![0_u8; 67108864].as_slice()).await?;
, and the behavior was exactly the same - 100% single core CPU usage when sending the data, so it doesn't matter.
Here's a more realistic example:
OpenSSH
$ head -c 1048576000 /dev/urandom | time ssh eugene@localhost "cat > /dev/null" ssh eugene@localhost "cat > /dev/null" 5.02s user 0.68s system 91% cpu 6.243 total
- 1GB per 5s real CPU time
russh
$ time cargo run --release --example test3 Finished release [optimized] target(s) in 0.15s Running `target/release/examples/test3` sent 1048576000 bytes in 6.553448791 seconds cargo run --release --example test3 10.98s user 1.20s system 162% cpu 7.496 total
- 1GB per 10s real CPU time - i.e. about half as fast
For me,
debug = true
incurs an additional 25% performance penalty
Sending data over a slow network takes orders of magnitude more time than any encryption, even in debug mode. The expected behavior is that the SSH implementation sleeps and does nothing while the data are being transmitted over the wire.
$ head -c 67108864 /dev/zero | time ssh user@10.0.0.123 "cat > /dev/null"
ssh user@10.0.0.123 "cat > /dev/null" 0.26s user 0.27s system 1% cpu 33.915 total
russh
debug$ time cargo run --example ssh_copy
Finished dev [unoptimized + debuginfo] target(s) in 0.13s
Running `target/debug/examples/ssh_copy`
sending data
cargo run --example ssh_copy 29.09s user 0.16s system 117% cpu 24.881 total
russh
release$ time cargo run --example ssh_copy --release
Finished release [optimized] target(s) in 0.13s
Running `target/release/examples/ssh_copy`
sending data
cargo run --example ssh_copy --release 24.98s user 0.18s system 99% cpu 25.276 total
OpenSSH is using 1% CPU, while russh
is using 117% CPU in debug mode and 99% CPU in release mode.
The behavior of OpenSSH is as expected - roughly 1% of the time it's actually doing anything, and 99% of the time it just sleeps while the data are being sent over the wire.
russh
on the other hand is wasting CPU 100% of the time. This means that there must be some sort of busy loop in the implementation of the russh::Channel::data
future.
ssh_copy
example
Recently I got strange behavior on my app, it can stuck with 100% cpu usage forever, and only restart of proccess can fix that. After some investigations, I found that it happens here:
Debug log:
log.txt