Closed nttoole closed 2 years ago
Testing notes
The current code works fine for the original test which use a made-up hostnames as failure cases.
Config:
hostnames:
- example.hostname.1
- atb-ocio-sspsim.jpl.nasa.gov
- example.hostname.2
Results:
2022-03-28T16:55:04.015 | INFO | Failed to connect to DSN at example.hostname.1. Trying next hostname.
2022-03-28T16:55:04.052 | INFO | Connection to DSN successful through atb-ocio-sspsim.jpl.nasa.gov.
2022-03-28T16:55:04.054 | INFO | Configuring SLE connection...
2022-03-28T16:55:04.055 | INFO | SLE connection configuration successful
2022-03-28T16:55:06.058 | INFO | Sending Bind request ...
2022-03-28T16:55:06.146 | INFO | Bind successful
2022-03-28T16:55:08.062 | INFO | Sending data start invocation ...
2022-03-28T16:55:08.163 | INFO | Start successful
2022-03-28T16:55:08.264 | INFO | Production Status Report: running
However, if we test with an actual host but not SLE service (e.g. www.google.com), then we witness the user-reported error.
Config:
hostnames:
- www.google.com
- example.hostname.1
- atb-ocio-sspsim.jpl.nasa.gov
- example.hostname.2
Results:
2022-03-28T16:57:59.750 | INFO | Failed to connect to DSN at www.google.com: Trying next hostname.
2022-03-28T16:57:59.759 | INFO | Failed to connect to DSN at example.hostname.1. Trying next hostname.
2022-03-28T16:57:59.802 | INFO | Failed to connect to DSN at atb-ocio-sspsim.jpl.nasa.gov. Trying next hostname.
2022-03-28T16:57:59.805 | INFO | Failed to connect to DSN at example.hostname.2. Trying next hostname.
2022-03-28T16:57:59.806 | ERROR | Connection failure with DSN. Aborting ...
After applying patch, re-running with config:
hostnames:
- www.google.com
- example.hostname.1
- atb-ocio-sspsim.jpl.nasa.gov
- example.hostname.2
...resulted in successful connection with the third entry of hostnames:
2022-03-29T11:51:14.498 | INFO | Failed to connect to DSN at www.google.com. Trying next hostname.
2022-03-29T11:51:14.505 | INFO | Failed to connect to DSN at example.hostname. Trying next hostname.
2022-03-29T11:51:14.663 | INFO | Connection to DSN successful through atb-ocio-sspsim.jpl.nasa.gov.
2022-03-29T11:51:14.665 | INFO | Configuring SLE connection...
2022-03-29T11:51:14.666 | INFO | SLE connection configuration successful
2022-03-29T11:51:16.667 | INFO | Sending Bind request ...
2022-03-29T11:51:16.775 | INFO | Bind successful
2022-03-29T11:51:18.669 | INFO | Sending data start invocation ...
2022-03-29T11:51:18.735 | INFO | Start successful
2022-03-29T11:51:18.847 | INFO | Production Status Report: running
Testing involved running : ait/dsn/bin/examples/raf_api_test
with SSP service setup detailed here: https://github.com/NASA-AMMOS/AIT-DSN/blob/017854200dc51929dff7c7662bc2f7f52dd8eb34/ait/dsn/bin/examples/raf_api_test.py#L17
User reported that round-robin connection attempts failed if the first connection fails
https://github.com/NASA-AMMOS/AIT-DSN/blob/1856df4bfd7469c0f0cbfd1a16ce7b546fb71f1c/ait/dsn/sle/common.py#L281
We should consider closing and creating a new Socket for each attempt.