DUNE-DAQ / nanorc

2 stars 2 forks source link

`start_connectivity_server` and port offsets #200

Closed plasorak closed 1 year ago

plasorak commented 1 year ago

When nanorc has to start the connectivity server and there are port offsets (for example in nanotimingrc and nano04rc, or --partition-number), the connectivity server is started on a port that isn't the one set in the environment. This causes the applications that are starting after that instantaneously crash on boot.

bieryAtFnal commented 1 year ago

Sorry for the bother, but I'm curious why I don't see this also.

When I use the following daqconf.json file

{
  "boot": {
    "use_connectivity_service": true,
    "start_connectivity_service": true,
    "connectivity_service_host": "localhost",
    "connectivity_service_port": 15432
  }, 
  "readout": {
    "clock_speed_hz": 62500000,
    "data_rate_slowdown_factor": 1,
    "use_fake_cards": true,
    "default_data_file": "asset://?label=WIBEth&subsystem=readout"
  },
  "trigger": {
    "trigger_window_before_ticks": 1000,
    "trigger_window_after_ticks": 1000,
    "trigger_rate_hz": 1.0
  }
}

and the following dro_map.json file

[
    {
        "src_id": 100,
        "geo_id": {
            "det_id": 3,
            "crate_id": 1,
            "slot_id": 0,
            "stream_id": 0
        },
        "kind": "eth",
        "parameters": {
            "protocol": "udp",
            "mode": "fix_rate",
            "rx_iface": 0,
            "rx_host": "localhost",
            "rx_mac": "00:00:00:00:00:00",
            "rx_ip": "0.0.0.0",
            "tx_host": "localhost",
            "tx_mac": "00:00:00:00:00:00",
            "tx_ip": "0.0.0.0"
        }
    },
    {
        "src_id": 101,
        "geo_id": {
            "det_id": 3,
            "crate_id": 1,
            "slot_id": 0,
            "stream_id": 1
        },
        "kind": "eth",
        "parameters": {
            "protocol": "udp",
            "mode": "fix_rate",
            "rx_iface": 0,
            "rx_host": "localhost",
            "rx_mac": "00:00:00:00:00:00",
            "rx_ip": "0.0.0.0",
            "tx_host": "localhost",
            "tx_mac": "00:00:00:00:00:00",
            "tx_ip": "0.0.0.0"
        }
    }
]

I'm able to successfully run

daqconf_multiru_gen -c ./daqconf.json --detector-readout-map-file ./dro_map.json my_test_config
nanorc --partition-number 2 my_test_config ${USER}-test boot conf start_run 111 wait 60 stop_run scrap terminate

Thanks for any information on what to do differently.

plasorak commented 1 year ago

I think you are right Kurt, thanks for following this up, basically, we misspelt the connectivity server host in our configuration which resulted in the behaviour observed. However, I cannot find where in the code the port offset is applied to the daq_application environment variables related to the connectivity server port. I'll follow up with Gordon.

plasorak commented 1 year ago

Okay, so following this up everything works as expected due to this snippet: https://github.com/DUNE-DAQ/nanorc/blob/132cf03384bd23c942f7529c2c77465e5f159362/src/nanorc/sshpm.py#L177 which updates the environment iff the connectivity service is started by nanorc. Sorry for the noise.