atlanticwave-sdx / sdx-lc

Local Controller of AtlanticWave SDX.
https://www.atlanticwave-sdx.net
MIT License
0 stars 2 forks source link

error handler on pull_topo_changes script #144

Closed italovalcy closed 3 weeks ago

italovalcy commented 2 months ago

Hi,

If you start SDX-LC and OXPO/Kytos together, and Kytos takes a bit longer than SDX-LC to start, the pull_topo_changes.py routine will die with the following error:

Traceback (most recent call last):
  File "/usr/src/app/sdx_lc/jobs/pull_topo_changes.py", line 97, in <module>
    main()
  File "/usr/src/app/sdx_lc/jobs/pull_topo_changes.py", line 27, in main
    process_domain_controller_topo(db_instance)
  File "/usr/src/app/sdx_lc/jobs/pull_topo_changes.py", line 52, in process_domain_controller_topo
    pulled_topology = urllib.request.urlopen(OXP_PULL_URL).read()
  File "/usr/local/lib/python3.9/urllib/request.py", line 214, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/local/lib/python3.9/urllib/request.py", line 517, in open
    response = self._open(req, data)
  File "/usr/local/lib/python3.9/urllib/request.py", line 534, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
  File "/usr/local/lib/python3.9/urllib/request.py", line 494, in _call_chain
    result = func(*args)
  File "/usr/local/lib/python3.9/urllib/request.py", line 1375, in http_open
    return self.do_open(http.client.HTTPConnection, req)
  File "/usr/local/lib/python3.9/urllib/request.py", line 1350, in do_open
    r = h.getresponse()
  File "/usr/local/lib/python3.9/http/client.py", line 1377, in getresponse
    response.begin()
  File "/usr/local/lib/python3.9/http/client.py", line 320, in begin
    version, status, reason = self._read_status()
  File "/usr/local/lib/python3.9/http/client.py", line 281, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/local/lib/python3.9/socket.py", line 704, in readinto
    return self._sock.recv_into(b)

I suggest we add some error handler around the code that retrieves the topology from OXPOs for situations like this. Furthermore, it would be nice if the error handler also consider Timeouts, just for the sake of being more robust.

sajith commented 2 months ago

@italovalcy The precise exception is missing from the backtrace. Do you happen to remember what it was?

There is a try-except block that catches a urllib.request.URLError around the offending line. It sounds like it probably should catch more than just that error.

https://github.com/atlanticwave-sdx/sdx-lc/blob/7d6e15b859180542eb29e3a484598360c7ce07df/sdx_lc/jobs/pull_topo_changes.py#L51-L56

sajith commented 2 months ago

Also, speaking strictly for myself, would be good to learn how to reproduce this. I don't know how to run OXPO/Kytos, or SDX-LC and OXPO/Kytos together.

italovalcy commented 1 month ago

@italovalcy The precise exception is missing from the backtrace. Do you happen to remember what it was?

There is a try-except block that catches a urllib.request.URLError around the offending line. It sounds like it probably should catch more than just that error.

https://github.com/atlanticwave-sdx/sdx-lc/blob/7d6e15b859180542eb29e3a484598360c7ce07df/sdx_lc/jobs/pull_topo_changes.py#L51-L56

Hi @sajith here is the error with most updated version of SDX-LC:

tenet-sdx-lc  | INFO:sdx_lc.messaging.topic_queue_consumer: [MQ] Awaiting requests from queue:'amq.gen-j7tUYfkxDMeuWBsYXhX-ig' with exchange_name: 'connection' routing_key:'tenet.ac.za' (MQ_HOST: 192.168.0.12, MQ_PORT: 5672)
tenet-sdx-lc  | Traceback (most recent call last):
tenet-sdx-lc  |   File "/usr/src/app/sdx_lc/jobs/pull_topo_changes.py", line 97, in <module>
tenet-sdx-lc  |     main()
tenet-sdx-lc  |   File "/usr/src/app/sdx_lc/jobs/pull_topo_changes.py", line 27, in main
tenet-sdx-lc  |     process_domain_controller_topo(db_instance)
tenet-sdx-lc  |   File "/usr/src/app/sdx_lc/jobs/pull_topo_changes.py", line 52, in process_domain_controller_topo
tenet-sdx-lc  |     pulled_topology = urllib.request.urlopen(OXP_PULL_URL).read()
tenet-sdx-lc  |   File "/usr/local/lib/python3.9/urllib/request.py", line 214, in urlopen
tenet-sdx-lc  |     return opener.open(url, data, timeout)
tenet-sdx-lc  |   File "/usr/local/lib/python3.9/urllib/request.py", line 517, in open
tenet-sdx-lc  |     response = self._open(req, data)
tenet-sdx-lc  |   File "/usr/local/lib/python3.9/urllib/request.py", line 534, in _open
tenet-sdx-lc  |     result = self._call_chain(self.handle_open, protocol, protocol +
tenet-sdx-lc  |   File "/usr/local/lib/python3.9/urllib/request.py", line 494, in _call_chain
tenet-sdx-lc  |     result = func(*args)
tenet-sdx-lc  |   File "/usr/local/lib/python3.9/urllib/request.py", line 1375, in http_open
tenet-sdx-lc  |     return self.do_open(http.client.HTTPConnection, req)
tenet-sdx-lc  |   File "/usr/local/lib/python3.9/urllib/request.py", line 1350, in do_open
tenet-sdx-lc  |     r = h.getresponse()
tenet-sdx-lc  |   File "/usr/local/lib/python3.9/http/client.py", line 1377, in getresponse
tenet-sdx-lc  |     response.begin()
tenet-sdx-lc  |   File "/usr/local/lib/python3.9/http/client.py", line 320, in begin
tenet-sdx-lc  |     version, status, reason = self._read_status()
tenet-sdx-lc  |   File "/usr/local/lib/python3.9/http/client.py", line 281, in _read_status
tenet-sdx-lc  |     line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
tenet-sdx-lc  |   File "/usr/local/lib/python3.9/socket.py", line 704, in readinto
tenet-sdx-lc  |     return self._sock.recv_into(b)
tenet-sdx-lc  | ConnectionResetError: [Errno 104] Connection reset by peer
italovalcy commented 1 month ago

Also, speaking strictly for myself, would be good to learn how to reproduce this. I don't know how to run OXPO/Kytos, or SDX-LC and OXPO/Kytos together.

Hi @sajith yes sure thing. Here is how to reproduce: https://sdx-docs.readthedocs.io/en/latest/sdx_deploy_single_server.html

This documentation should cover the whole process of running the components. I have to update it to reflect the recent changes on SDX-LC and SDX-Controller, but they will be easy to identify.