POETSII / Orchestrator

The Orchestrator is the configuration and run-time management system for POETS platforms.
1 stars 1 forks source link

Orchestrator_examples/ring_test.xml hangs #324

Open m8pple opened 1 year ago

m8pple commented 1 year ago

If I try to run Orchestrator_examples/ring_test.xml, then it compiles and loads correctly then appears to hang.

I'm assuming (?) this should work as it is the one from the Orch_VolII documentation. I also tried Orchestrator_examples/ping_pong_test.xml, and that worked fine.

dt10@byron:~/Orchestrator$ Tests/ReferenceXML/run_app_standard_outputs.exp /home/dt10/Orchestrator_examples/ring_test.xml  60
Relative xml path = /home/dt10/Orchestrator_examples/ring_test.xml
Absolute xml path = /home/dt10/Orchestrator_examples/ring_test.xml
spawn sh -c /home/dt10/Orchestrator/Tests/ReferenceXML/../../orchestrate.sh 2>&1
POETS>load /app = "/home/dt10/Orchestrator_examples/ring_test.xml"
POETS> 21:58:28.01:  20(I) The microlog for the command 'load /engine = "../Config/POETSHardwareOneBox.ocfg"' will be written to '../Output/Microlog/Microlog_2022_07_13T21_58_28p0.plog'.
POETS> 21:58:28.02: 140(I) Topology loaded from file ||../Config/POETSHardwareOneBox.ocfg||.
POETS> 21:58:28.02:  23(I) load /app = "/home/dt10/Orchestrator_examples/ring_test.xml"
POETS> 21:58:28.02:  20(I) The microlog for the command 'load /app = "/home/dt10/Orchestrator_examples/ring_test.xml"' will be written to '../Output/Microlog/Microlog_2022_07_13T21_58_28p1.plog'.
POETS> 21:58:28.02: 235(I) Application file /home/dt10/Orchestrator_examples/ring_test.xml loading...
POETS> 21:58:28.02:  65(I) Application file /home/dt10/Orchestrator_examples/ring_test.xml loaded in 19 ms.
POETS>tlink /app = *
place /tfill = *
POETS>POETS> 21:58:28.02:  23(I) tlink /app = *
POETS> 21:58:28.02:  20(I) The microlog for the command 'tlink /app = *' will be written to '../Output/Microlog/Microlog_2022_07_13T21_58_28p2.plog'.
POETS> 21:58:28.02: 234(I) Typelinking graph instance 'ring_test_instance'...
POETS> 21:58:28.02: 249(I) Successfully typelinked graph instance 'ring_test_instance'.
POETS> 21:58:28.02:  23(I) place /tfill = *
POETS> 21:58:28.02:  20(I) The microlog for the command 'place /tfill = *' will be written to '../Output/Microlog/Microlog_2022_07_13T21_58_28p3.plog'.
POETS> 21:58:28.02: 309(I) Attempting to place graph instance 'ring_test_instance' using the 'tfil' method...
POETS> 21:58:28.02: 302(I) Graph instance 'ring_test_instance' placed successfully.
POETS>compose /app = *

POETS>POETS> 21:58:29.03:  23(I) compose /app = *
POETS> 21:58:29.03:  20(I) The microlog for the command 'compose /app = *' will be written to '../Output/Microlog/Microlog_2022_07_13T21_58_28p4.plog'.
POETS> 21:58:29.03: 803(I) Composing graph instance 'ring_test_instance'...
POETS> 21:58:29.03: 804(I) Graph instance 'ring_test_instance' composed successfully.
POETS>Graph appears to have loaded and compiled!

Waiting for 15.0 seconds to let HostLink init
deploy /app = *

initialise /app = *

run /app = *

POETS>POETS>POETS>POETS>POETS> 21:58:44.03:  23(I) deploy /app = *
POETS>POETS> 21:58:44.03:  20(I) The microlog for the command 'deploy /app = *' will be written to '../Output/Microlog/Microlog_2022_07_13T21_58_44p0.plog'.
POETS> 21:58:44.03: 184(I) Deployment of graph instance 'ring_test_instance' staged. Waiting for Mothership(s) to acknowledge receipt in the background.
POETS> 21:58:44.03:  23(I) initialise /app = *
POETS> 21:58:44.03:  20(I) The microlog for the command 'initialise /app = *' will be written to '../Output/Microlog/Microlog_2022_07_13T21_58_44p1.plog'.
POETS> 21:58:44.03: 187(I) Initialisation of graph instance 'ring_test_instance' staged. Waiting for Mothership(s) to acknowledge receipt in the background.
POETS> 21:58:44.03:  23(I) run /app = *
POETS> 21:58:44.03:  20(I) The microlog for the command 'run /app = *' will be written to '../Output/Microlog/Microlog_2022_07_13T21_58_44p2.plog'.
POETS> 21:58:44.03: 188(I) Run of graph instance 'ring_test_instance' staged. Waiting for Mothership(s) to acknowledge receipt in the background.
POETS> 21:58:44.03: 529(I) Mothership (rank 2): Deployment of application 'ring_test::ring_test_instance' (to this Mothership) complete.
POETS> 21:58:44.03: 186(I) Application 'ring_test::ring_test_instance' successfully deployed on all Motherships it is mapped to.
POETS> 21:58:44.03: 530(I) Mothership (rank 2): Initialising fully-defined application 'ring_test::ring_test_instance'.
POETS> 21:58:44.09: 531(I) Mothership (rank 2): Initialisation of application 'ring_test::ring_test_instance' (to this Mothership) complete.
POETS> 21:58:44.09: 186(I) Application 'ring_test::ring_test_instance' ready to start on all Motherships it is mapped to.
POETS> 21:58:44.09: 532(I) Mothership (rank 2): Starting (running) fully-initialised application 'ring_test::ring_test_instance'.
POETS> 21:58:44.09: 186(I) Application 'ring_test::ring_test_instance' running on all Motherships it is mapped to.
POETS>
STATS_fba956f3: load:0.009813, place:0.001493, compile:0.584081, run:60.070602

Timeout while running app, timeout=60.

Orchestrator version:

commit a2af253a39cabed7e401e062556949e51b330741 (HEAD -> development, origin/development, origin/HEAD)
Merge: e61a6f0 8edaf17
Author: Mark Vousden <m.vousden@soton.ac.uk>
Date:   Wed Jul 6 11:29:13 2022 +0100

    Merge pull request #318 from POETSII/BUGFIX-0317-onsystpingack-warning

Orchestrator_examples:

commit 0b6d2d319aa716c4dcb875df2eb303ab14e20503 (HEAD -> development)
Merge: 6d7e7ad 3ad3ee0
Author: Graeme Bragg <gmb@ecs.soton.ac.uk>
Date:   Thu Dec 16 03:53:04 2021 +0000

    Merge pull request #14 from POETSII/BUGFIX-0231-typenames
m8pple commented 1 year ago

It looks to me like the logic is slightly wrong, as we have the ReadyToSend in the devices of:

https://github.com/POETSII/Orchestrator_examples/blob/3a3658b38179f1389c26a499538a82d7a6683c5a/ring_test.xml#L111-L115

That will cause either the sender pin to fire or the SupervisorOutPin to fire. Both of those handlers finish with DEVICESTATE(sendMessage) = 0;. So either a device message is sent along the ring and one to the supervisor is lost, or vice-versa.

The supervisor has count-down logic to check that all expected messages are received, so if any messages are lost then it will hang. However, if any messages are sent to the supervisor then the token in the ring is lost, which will also cause it to hang.

I might be missing something though - is this example expected to work, or is it for documentation?

mvousden commented 1 year ago

I haven't dived particularly deep into this one, but it looks like you're using an out-of-date example: orchestrator-examples at 0b6d2d3 was from seven months ago, whereas 3a3658b (development HEAD) is more recent.

It looks to me like the logic is slightly wrong, as we have the ReadyToSend in the devices of:

https://github.com/POETSII/Orchestrator_examples/blob/3a3658b38179f1389c26a499538a82d7a6683c5a/ring_test.xml#L111-L115

That will cause either the sender pin to fire or the SupervisorOutPin to fire. Both of those handlers finish with DEVICESTATE(sendMessage) = 0;. So either a device message is sent along the ring and one to the supervisor is lost, or vice-versa.

This is not correct - the RTS "bits" are not reset after every invocation of the ReadyToSend logic. What happens is:

  1. ReadyToSend code is invoked, and the RTS bit for the sender pin, and the RTS bit for the supervisor output pin, are both set high.
  2. The softswitch invokes the sender output pin code first [1]. The lap field of the outbound packet is populated, and the sendMessage field of device state is set to zero.
  3. This packet is sent, as per the softswitch's sending mechanism.
  4. The softswitch invokes the SupervisorOutPin code, populating the sourceId and lap fields of the next outbound packet, and setting the sendMessage field of device state to zero again.
  5. This packet is sent, as per step 3.

@heliosfa thoughts?

I might be missing something though - is this example expected to work, or is it for documentation?

Yes, it is supposed to work ;p

[1]: Strictly speaking, the order is undefined according to the Softswitch documentation, but the order doesn't matter in this case.