P1sec / pysctp

SCTP stack for Python
http://www.p1sec.com
165 stars 67 forks source link

ProcessLookupError: [Errno 3] No such process #50

Closed tariromukute closed 1 year ago

tariromukute commented 1 year ago

Hi. Thank you for the project.

I am using pysctp in a project implementing a 5GC traffic generator. The library works fine, however when I increase the number of UEs hence the data rate I start getting the error below.

Traceback (most recent call last):
  File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/core-tg/src/NGAPSim.py", line 261, in _ngap_to_ue_thread_function
    self.sctp.send(ngap_pdu.to_aper())
  File "/home/ubuntu/core-tg/src/SCTP.py", line 71, in send
    self.__socket.sctp_send(data, ppid=socket.htonl(60))
  File "/home/ubuntu/core-tg/.venv/lib/python3.8/site-packages/sctp.py", line 1178, in sctp_send
    return _sctp.sctp_send_msg(self._sk.fileno(), msg, to, ppid, flags, stream, timetolive, context)
ProcessLookupError: [Errno 3] No such process

I don't know where to start debugging. I also don't know if this is a limitation or bug with the package or with my implementation. I would appreciate any pointers. The SCTP.py is as below, you can find the project here

import socket
import logging
import sctp

logger = logging.getLogger('__SCTPClient__')

class SCTPClient():
    """ This class is a base class for all SCTP client contexts. """
    def __init__(self, server_config):
        try:
            self.__socket = self._load_socket()
        except:
            logger.exception("Failed to create socket")
            raise

        self.server_config = server_config['amfConfigs'][0]
        self.connect()

    def connect(self) -> None:
        logger.info("Connecting to 5G Core")
        logger.info("Server config: {}".format(self.server_config))
        self.__socket.connect((self.server_config['address'], self.server_config['port']))

    def disconnect(self) -> None:
        logger.info("Disconnecting from 5G Core")
        self.__socket.close()

    def _load_socket(self):
        # Create SCTP socket
        return sctp.sctpsocket_tcp(socket.AF_INET)

    def send(self, data: bytes):
        self.__socket.sctp_send(data, ppid=socket.htonl(60))

    def recv(self) -> bytes:
        if self.__socket._closed:
            return None
        sctp_data = self.__socket.recv(4096)
        if not sctp_data:
            logger.info('SCTP disconnection')
            return None
        return sctp_data

Thank you in advance

p1-bmu commented 1 year ago

Thanks for submitting this issue.

Looking at your implementation, it seems you have 2 concurrent threads working on a single SCTP socket: _ngap_to_ue_thread_function and _ue_to_ngap_thread_function are both running in the background. I guess it may happen that you have send() and recv() calls which could overlap. As pysctp releases the GIL when calling sctp_sendmsg() and sctp_recvmsg(), this could potentially lead to such an error. Otherwise, I don't have much ideas from the information provided. Eventually, you can try to recompile pysctp after commenting out the Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS macros in _sctp.c.

As a side note, you should take care to manage SCTP stream ID consistently toward the core network: this ID must be 0 for system-level NGAP procedures, while >0 for UE-related procedures.

tariromukute commented 1 year ago

Thank you @p1-bmu for the pointer. The concurrent access was likely the error. I added I/O multiplexing with select and added locking for recv and send. I am now getting a new error ConnectionResetError, which I am taking as the previous error was resolved by ensuring that the recv and send are only called one at a time. FYI please see the updated code below.

import socket
import logging
import sctp
import threading
import select

logger = logging.getLogger('__SCTPClient__')

# Create a lock to synchronize access to socket
socket_lock = threading.Lock()

class SCTPClient():
    """ This class is a base class for all SCTP client contexts. """
    def __init__(self, server_config):
        try:
            self.__socket = self._load_socket()
        except:
            logger.exception("Failed to create socket")
            raise
        # TODO: Allow connecting to multiple servers
        self.server_config = server_config['amfConfigs'][0]
        self.connect()

    def connect(self) -> None:
        logger.debug("Connecting to 5G Core")
        logger.debug("Server config: {}".format(self.server_config))
        self.__socket.connect((self.server_config['address'], self.server_config['port']))

    def disconnect(self) -> None:
        logger.debug("Disconnecting from 5G Core")
        self.__socket.close()

    def _load_socket(self):
        # Create SCTP socket
        return sctp.sctpsocket_tcp(socket.AF_INET)

    def send(self, data: bytes):
        # Acquire the socket lock before sending data
        sock = self.__socket.sock()
        while True:
            readable, writable, exceptional = select.select([], [sock], [sock], 0.1)

            if sock in writable:
                break

        socket_lock.acquire()
        try:
            self.__socket.sctp_send(data, ppid=socket.htonl(60))
        finally:
            # Release the socket lock after sending data
            socket_lock.release() 

    def recv(self) -> bytes:
        if self.__socket._closed:
            return None

        sock = self.__socket.sock()
        readable, writable, exceptional = select.select([sock], [], [sock], 0.1)

        if sock not in readable:
            return None

        # Acquire the socket lock before receiving data
        socket_lock.acquire()
        try:
            sctp_data = self.__socket.recv(4096)
            if not sctp_data:
                logger.debug('SCTP disconnection')
                return None
            return sctp_data
        finally:
            # Release the socket lock after receiving data
            socket_lock.release()

    def close(self) -> None:
        return super().close()

After resolving the new error below, I will go on to try the suggestion of recompiling the pysctp. I would appreciate any suggestions or pointers on the new error.

Traceback (most recent call last):
  File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/core-tg/src/NGAPSim.py", line 294, in _ue_to_ngap_thread_function
    self.sctp.send(PDU.to_aper())
  File "/home/ubuntu/core-tg/src/SCTP.py", line 51, in send
    self.__socket.sctp_send(data, ppid=socket.htonl(60))
  File "/home/ubuntu/core-tg/.venv/lib/python3.8/site-packages/sctp.py", line 1178, in sctp_send
    return _sctp.sctp_send_msg(self._sk.fileno(), msg, to, ppid, flags, stream, timetolive, context)
ConnectionResetError: [Errno 104] Connection reset by peer

Also, thanks for the side note on the stream id, I will update accordingly.

p1-bmu commented 1 year ago

This certainly indicates that you call select() on the socket, or the socket method recv() or send(), while it has been closed by the remote peer.

tariromukute commented 1 year ago

Hi @p1-bmu. Yeah that is correct. I am trying to see why the connection is being closed, and thereafter how to handle it gracefully. I used eBPF to monitor the SCTP connection and it looks like the received window is reduced to 0 bytes. I would like to believe that is why the connection is then reset on the core network side (FYI I am testing with free5gc, also see attached graph of the receive window stats).

Since the purpose of my current study is overloading the control network functions, I assume that's a positive result for now. I just have to be certain that it's the reason the connection is closed. I will feedback my findings here as well. As above I will appreciate any pointers.

Thanks for the insights. I assume I can close this issue, or should recompile pysctp with the suggested and check if that resolves the issue as well before closing the issue?

free5gc_rwnd_plot_2000

p1-bmu commented 1 year ago

You should check the free5gc AMF logs, it will certainly indicate why SCTP sockets are getting closed or exhausted. Or eventually check the kernel logs in the machine running the AMF.

Regarding the GIL lock release in pysctp, I guess you don't need to test its removal as you solved your issue: it will only lower your overall performance on the Python side. And yes, You can close this issue.

tariromukute commented 1 year ago

Thanks @p1-bmu, I checked the logs on free5gc and the logs showing seem to indicated the connection was closed from the traffic generator side. However, my understating of the ConnectionResetError as per my search is it indicates the peer closed the connection/ The logs that lead me to thinking free5gc see the connection being dropped on the other side are below

2023-08-10T16:15:54.514674135Z [WARN][AMF][Gmm][amf_ue_ngap_id:RU:207,AU:205(3GPP)][supi:SUPI:] T3560 expires, retransmit Authentication Request (retry: 3)
2023-08-10T16:15:54.514785161Z [INFO][AMF][Ngap][amf_ue_ngap_id:RU:207,AU:205(3GPP)][ran_addr:10.0.0.16:56174] Send Downlink Nas Transport
2023-08-10T16:15:54.515228581Z [ERRO][AMF][Ngap][ran_addr:10.0.0.16:56174] Send error: no such process

I there a way to get a notification with pysctp on a connection that is closing and why it's closing (Outside the socket error message received above)?

tariromukute commented 1 year ago

Thanks @p1-bmu, I checked the logs on free5gc and the logs showing seem to indicated the connection was closed from the traffic generator side. However, my understating of the ConnectionResetError as per my search is it indicates the peer closed the connection/ The logs that lead me to thinking free5gc see the connection being dropped on the other side are below

2023-08-10T16:15:54.514674135Z [WARN][AMF][Gmm][amf_ue_ngap_id:RU:207,AU:205(3GPP)][supi:SUPI:] T3560 expires, retransmit Authentication Request (retry: 3)
2023-08-10T16:15:54.514785161Z [INFO][AMF][Ngap][amf_ue_ngap_id:RU:207,AU:205(3GPP)][ran_addr:10.0.0.16:56174] Send Downlink Nas Transport
2023-08-10T16:15:54.515228581Z [ERRO][AMF][Ngap][ran_addr:10.0.0.16:56174] Send error: no such process

I there a way to get a notification with pysctp on a connection that is closing and why it's closing (Outside the socket error message received above)?

I am seeing that is I use stop_recv instead of recv I will get more information which I assume can be useful. I am going to try that out later to see if it helps. Thanks.