ethercrab-rs / ethercrab

EtherCAT master written in pure Rust
258 stars 25 forks source link

ek1100.rs example: thread 'main' panicked at 'Init: WorkingCounter ... #63

Closed jubeormk1 closed 1 year ago

jubeormk1 commented 1 year ago

I just started learning how to use this library by running the ek1100.rs example.

I have a simple topology: Ubuntu 22.04 Server minimized and:

Cloning this repo to have stable support as commented in the matrix channel (very quick response! Thanks for that!) and running the mentioned example I get the next error:

[2023-07-11T02:09:06Z INFO  ek1100] Starting EK1100 demo...
[2023-07-11T02:09:06Z INFO  ek1100] Ensure an EK1100 is the first slave, with any number of modules connected after
[2023-07-11T02:09:06Z INFO  ek1100] Run with RUST_LOG=ethercrab=debug or =trace for debug information
[2023-07-11T02:09:06Z DEBUG ethercrab::slave] Slave 0x1000 name EK1100
[2023-07-11T02:09:07Z DEBUG ethercrab::slave] Slave 0x1001 name EL2004
[2023-07-11T02:09:07Z DEBUG ethercrab::client] Configuring topology/distributed clocks
thread 'main' panicked at 'Init: WorkingCounter { expected: 2, received: 1, context: Some("Broadcast time") }', examples/ek1100.rs:83:10

I build the example with cargo build --example ek1100 --release.

After that run it by sudo RUST_BACKTRACE=full RUST_LOG=debug ./target/release/examples/ek1100 eth0 where eth0 is the name of the NIC where the EtherCAT equipment is plugged.

For some more context this is the full stack trace:

thread 'main' panicked at 'Init: WorkingCounter { expected: 2, received: 1, context: Some("Broadcast time") }', examples/ek1100.rs:83:10
stack backtrace:
   0:     0x560e53c1e23a - std::backtrace_rs::backtrace::libunwind::trace::h9a6b80bbf328ba5d
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
   1:     0x560e53c1e23a - std::backtrace_rs::backtrace::trace_unsynchronized::hd162ec543a11886b
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:     0x560e53c1e23a - std::sys_common::backtrace::_print_fmt::h78a5099be12f51a6
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/sys_common/backtrace.rs:65:5
   3:     0x560e53c1e23a - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::ha1c5390454d74f71
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/sys_common/backtrace.rs:44:22
   4:     0x560e53c3a8af - core::fmt::write::h9ffde816c577717b
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/fmt/mod.rs:1254:17
   5:     0x560e53c1b9d5 - std::io::Write::write_fmt::h88186074961638e4
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/io/mod.rs:1698:15
   6:     0x560e53c1e005 - std::sys_common::backtrace::_print::h184198273ed08d59
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/sys_common/backtrace.rs:47:5
   7:     0x560e53c1e005 - std::sys_common::backtrace::print::h1b4d8e7add699453
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/sys_common/backtrace.rs:34:9
   8:     0x560e53c1f6ae - std::panicking::default_hook::{{closure}}::h393bcea75423915a
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:269:22
   9:     0x560e53c1f455 - std::panicking::default_hook::h48c64f31d8b3fd03
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:288:9
  10:     0x560e53c1fc0e - std::panicking::rust_panic_with_hook::hafdc493a79370062
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:691:13
  11:     0x560e53c1fb09 - std::panicking::begin_panic_handler::{{closure}}::h0a64bc82e36bedc7
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:582:13
  12:     0x560e53c1e6a6 - std::sys_common::backtrace::__rust_end_short_backtrace::hc203444fb7416a16
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/sys_common/backtrace.rs:150:18
  13:     0x560e53c1f862 - rust_begin_unwind
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:578:5
  14:     0x560e53a69193 - core::panicking::panic_fmt::h0f6ef0178afce4f2
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/panicking.rs:67:14
  15:     0x560e53a695e3 - core::result::unwrap_failed::h8090202169109f9c
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/result.rs:1687:5
  16:     0x560e53a78798 - tokio::runtime::park::CachedParkThread::block_on::he68682098847e7b5
  17:     0x560e53a78c36 - tokio::runtime::context::runtime::enter_runtime::h57a07d4b664d4d90
  18:     0x560e53adb101 - tokio::runtime::runtime::Runtime::block_on::h29109ff0d856e13d
  19:     0x560e53a6e78f - ek1100::main::h1550c917b3a86bf5
  20:     0x560e53adc756 - std::sys_common::backtrace::__rust_begin_short_backtrace::h15abfc1a7efe2366
  21:     0x560e53a70511 - std::rt::lang_start::{{closure}}::h5e04481cc6f084b8
  22:     0x560e53c176fe - core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once::hb1327dc2ef3fecdf
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/ops/function.rs:287:13
  23:     0x560e53c176fe - std::panicking::try::do_call::h4044173225fe83dd
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:485:40
  24:     0x560e53c176fe - std::panicking::try::hd8a722c09d156a53
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:449:19
  25:     0x560e53c176fe - std::panic::catch_unwind::hd2ca07971cf0119b
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panic.rs:140:14
  26:     0x560e53c176fe - std::rt::lang_start_internal::{{closure}}::h26d89d595cf47b70
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/rt.rs:148:48
  27:     0x560e53c176fe - std::panicking::try::do_call::hf47aa1aa005e5f1a
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:485:40
  28:     0x560e53c176fe - std::panicking::try::h73d246b2423eaf4e
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:449:19
  29:     0x560e53c176fe - std::panic::catch_unwind::hbaaeae8f1b2f9915
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panic.rs:140:14
  30:     0x560e53c176fe - std::rt::lang_start_internal::h76f3e81e6b8f13f9
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/rt.rs:148:20
  31:     0x560e53a6e865 - main
  32:     0x7f3343d77d90 - <unknown>
  33:     0x7f3343d77e40 - __libc_start_main
  34:     0x560e53a69775 - _start
  35:                0x0 - <unknown>

I am also testing the c wrapper ethercat-rs so I had to install the IgH modules but all of that is disabled and stopped. I tried the example in interfaces previously configured as ec_master and others unassigned to ethercat with the same results.

I will post more information with more testing.

jubeormk1 commented 1 year ago

Same problem with a single EL1004 slave:

[2023-07-11T02:43:25Z INFO  ek1100] Starting EK1100 demo...
[2023-07-11T02:43:25Z INFO  ek1100] Ensure an EK1100 is the first slave, with any number of modules connected after
[2023-07-11T02:43:25Z INFO  ek1100] Run with RUST_LOG=ethercrab=debug or =trace for debug information
[2023-07-11T02:43:25Z DEBUG ethercrab::slave] Slave 0x1000 name EK1100
[2023-07-11T02:43:25Z DEBUG ethercrab::slave] Slave 0x1001 name EL1014
[2023-07-11T02:43:25Z DEBUG ethercrab::client] Configuring topology/distributed clocks
thread 'main' panicked at 'Init: WorkingCounter { expected: 2, received: 1, context: Some("Broadcast time") }', examples/ek1100.rs:83:10

stacktrace:

stack backtrace:
   0:     0x558ab686f23a - std::backtrace_rs::backtrace::libunwind::trace::h9a6b80bbf328ba5d
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
   1:     0x558ab686f23a - std::backtrace_rs::backtrace::trace_unsynchronized::hd162ec543a11886b
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:     0x558ab686f23a - std::sys_common::backtrace::_print_fmt::h78a5099be12f51a6
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/sys_common/backtrace.rs:65:5
   3:     0x558ab686f23a - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::ha1c5390454d74f71
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/sys_common/backtrace.rs:44:22
   4:     0x558ab688b8af - core::fmt::write::h9ffde816c577717b
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/fmt/mod.rs:1254:17
   5:     0x558ab686c9d5 - std::io::Write::write_fmt::h88186074961638e4
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/io/mod.rs:1698:15
   6:     0x558ab686f005 - std::sys_common::backtrace::_print::h184198273ed08d59
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/sys_common/backtrace.rs:47:5
   7:     0x558ab686f005 - std::sys_common::backtrace::print::h1b4d8e7add699453
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/sys_common/backtrace.rs:34:9
   8:     0x558ab68706ae - std::panicking::default_hook::{{closure}}::h393bcea75423915a
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:269:22
   9:     0x558ab6870455 - std::panicking::default_hook::h48c64f31d8b3fd03
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:288:9
  10:     0x558ab6870c0e - std::panicking::rust_panic_with_hook::hafdc493a79370062
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:691:13
  11:     0x558ab6870b09 - std::panicking::begin_panic_handler::{{closure}}::h0a64bc82e36bedc7
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:582:13
  12:     0x558ab686f6a6 - std::sys_common::backtrace::__rust_end_short_backtrace::hc203444fb7416a16
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/sys_common/backtrace.rs:150:18
  13:     0x558ab6870862 - rust_begin_unwind
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:578:5
  14:     0x558ab66ba193 - core::panicking::panic_fmt::h0f6ef0178afce4f2
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/panicking.rs:67:14
  15:     0x558ab66ba5e3 - core::result::unwrap_failed::h8090202169109f9c
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/result.rs:1687:5
  16:     0x558ab66c9798 - tokio::runtime::park::CachedParkThread::block_on::he68682098847e7b5
  17:     0x558ab66c9c36 - tokio::runtime::context::runtime::enter_runtime::h57a07d4b664d4d90
  18:     0x558ab672c101 - tokio::runtime::runtime::Runtime::block_on::h29109ff0d856e13d
  19:     0x558ab66bf78f - ek1100::main::h1550c917b3a86bf5
  20:     0x558ab672d756 - std::sys_common::backtrace::__rust_begin_short_backtrace::h15abfc1a7efe2366
  21:     0x558ab66c1511 - std::rt::lang_start::{{closure}}::h5e04481cc6f084b8
  22:     0x558ab68686fe - core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once::hb1327dc2ef3fecdf
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/ops/function.rs:287:13
  23:     0x558ab68686fe - std::panicking::try::do_call::h4044173225fe83dd
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:485:40
  24:     0x558ab68686fe - std::panicking::try::hd8a722c09d156a53
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:449:19
  25:     0x558ab68686fe - std::panic::catch_unwind::hd2ca07971cf0119b
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panic.rs:140:14
  26:     0x558ab68686fe - std::rt::lang_start_internal::{{closure}}::h26d89d595cf47b70
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/rt.rs:148:48
  27:     0x558ab68686fe - std::panicking::try::do_call::hf47aa1aa005e5f1a
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:485:40
  28:     0x558ab68686fe - std::panicking::try::h73d246b2423eaf4e
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:449:19
  29:     0x558ab68686fe - std::panic::catch_unwind::hbaaeae8f1b2f9915
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panic.rs:140:14
  30:     0x558ab68686fe - std::rt::lang_start_internal::h76f3e81e6b8f13f9
                               at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/rt.rs:148:20
  31:     0x558ab66bf865 - main
  32:     0x7f03ac6f6d90 - <unknown>
  33:     0x7f03ac6f6e40 - __libc_start_main
  34:     0x558ab66ba775 - _start
  35:                0x0 - <unknown>
jubeormk1 commented 1 year ago

Way more exciting when using a EL5151. No panic at all. The example reveals information about the slaves but since the EL5151 does not have any output nothing happens in the loop.

Info output:

[2023-07-11T02:55:18Z INFO  ek1100] Starting EK1100 demo...
[2023-07-11T02:55:18Z INFO  ek1100] Ensure an EK1100 is the first slave, with any number of modules connected after
[2023-07-11T02:55:18Z INFO  ek1100] Run with RUST_LOG=ethercrab=debug or =trace for debug information
[2023-07-11T02:55:21Z INFO  ek1100] Discovered 2 slaves
[2023-07-11T02:55:21Z INFO  ek1100] -> Slave 0x1000 EK1100 has 0 input bytes, 0 output bytes
[2023-07-11T02:55:21Z INFO  ek1100] -> Slave 0x1001 EL5151-0021 has 18 input bytes, 14 output bytes

Looking at the debug output there is a great deal of information about the discovered Sync managers and PDOs, which is great. Full output including debug:

full_output.log

jamwaffles commented 1 year ago

Hey, thanks for the detailed issue! Much appreciated. I've got an EK1100 and EL2004 on my test rig and running the ek1100 example in release mode I can't reproduce your working counter panic I'm afraid. Can you rerun the EK1100 + EL2004 setup but get a Wireshark capture and post it here? Please run in debug mode for the capture, i.e. no --release.

Just checking - you're using latest master in your fork, right?

P.s. if you have the just task runner installed (cargo install just) you can run the examples with RUST_LOG=debug just linux-example-release ek1100 (release) or RUST_LOG=debug just linux-example ek1100 (debug) - I've set those commands up to use setcap so you don't have to go searching for binaries in target/.

jubeormk1 commented 1 year ago

Hi again @jamwaffles, I follow your advice and capture the packets while running the example.

Tests performed

I capture the packets in the ethercat interface while running RUST_LOG=debug just linux-example-release ek1100 ethercat0 in the next situations:

The logs and packets recorded are attacked can be accesed here: wireshark-ethercrab-EK1100.zip

Some additional information for others reading this issue thread:

To run wireshark in a headless machine you need to install wireshark and tshark. In an ubuntu machine that would be done by: sudo apt install wireshark tshark after that run: > sudo tshark -i ethercat0 -F pcapng

jubeormk1 commented 1 year ago

I made some progress troubleshooting this issue:

Following the lead the problem happens at client.init<const MAX_SLAVES: usize, G> when calling dc::configure_dc(self, slaves.as_mut_slices().0).await? in client.rs.

I took that out of the equation and now and the debug output is describing all the sync managers, FMMUs, FMMUs mappings PDOs, etc. More here: output.log.

You can see the changes that I made in the next gist Hot fix for ethercrab at commit 29f329f.

I will keep working from there.

jamwaffles commented 1 year ago

Thanks for the extra logs. I've got a small change for you to try if you don't mind. In src/dc.rs, fn latch_dc_times, look for this block:

// Latch receive times into all ports of all slaves.
client
    .bwr(RegisterAddress::DcTimePort0, 0u32)
    .await
    .expect("Broadcast time")
    .wkc(num_slaves as u16, "Broadcast time")?;

and replace it with:

// Latch receive times into all ports of all slaves.
client
    .bwr(RegisterAddress::DcTimePort0, 0u32)
    .await
    .expect("Broadcast time");

(just delete the .wkc(...) line)

This ignores the working counter and lets your program proceed a bit further to the point it prints some better debug info.

Please post the log of RUST_LOG=debug ... and another wireshark trace here. Please don't run it with --release as the test harness I wrote to replay Wireshark captures doesn't work well with release builds, making it more difficult for me to reproduce locally..

jamwaffles commented 1 year ago

Hey @jubeormk1 this issue was closed by #69 (nice) but I can't confirm if the fix works all the way through. Please give latest master a test and reopen here if it still doesn't work :).

jubeormk1 commented 1 year ago

Hi again!

I have been working in other things and did not test this and did not realised that I was working on my "patched" version. I have more feedback for my particular system:

after pulling the latest version of the master branch the error has changed:

$ ~/ethercrab-rs/ethercrab$ just linux-example ek1100 enp5s0f2
cargo build --example ek1100 && sudo setcap cap_net_raw=pe ./target/debug/examples/ek1100 && ./target/debug/examples/ek1100 enp5s0f2
    Finished dev [unoptimized + debuginfo] target(s) in 0.04s
[2023-07-20T23:37:02Z INFO  ek1100] Starting EK1100 demo...
[2023-07-20T23:37:02Z INFO  ek1100] Ensure an EK1100 is the first slave, with any number of modules connected after
[2023-07-20T23:37:02Z INFO  ek1100] Run with RUST_LOG=ethercrab=debug or =trace for debug information
thread 'main' panicked at 'Init: WorkingCounter { expected: 1, received: 0, context: Some("Port receive times") }', examples/ek1100.rs:84:10
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
error: Recipe `linux-example` failed on line 4 with exit code 101

The string "Port receive times" is only found in src/dc.rs:latch_dc_times:

      let [time_p0, time_p1, time_p2, time_p3] = sl
          .read::<[u32; 4]>(RegisterAddress::DcTimePort0, "Port receive times")
          .await?;

Adding some silly logging in latch_dc_times inside the for slave in slaves loop the first slave to fail is IL1004. the iterator is going over slaves that support and do not support dc:

[2023-07-21T00:02:21Z INFO  ek1100] Starting EK1100 demo...
[2023-07-21T00:02:21Z INFO  ek1100] Ensure an EK1100 is the first slave, with any number of modules connected after
[2023-07-21T00:02:21Z INFO  ek1100] Run with RUST_LOG=ethercrab=debug or =trace for debug information
[2023-07-21T00:02:22Z INFO  ethercrab::dc] Trying slave (address: "EK1100")
[2023-07-21T00:02:22Z INFO  ethercrab::dc] Trying slave (address: "EL5151-0021")
[2023-07-21T00:02:22Z INFO  ethercrab::dc] Trying slave (address: "EL1014")
thread 'main' panicked at 'Init: WorkingCounter { expected: 1, received: 0, context: Some("Port receive times") }', examples/ek1100.rs:84:10
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Here is my workaround that iterates over the slaves that support dc borrowing your filter for num_slaves_with_dc.

jamwaffles commented 1 year ago

Second attempt from #73 seems to make this work now. Please reopen (again lol) if this breaks (again).

jubeormk1 commented 1 year ago

I can confirm that it works for EK1100 + EL5151 +EL1004 + 5x EL2004. Thanks!