Closed kthare10 closed 1 year ago
Attaching the graphml of the topology which caused the error.
Need to see if this is still a problem.
@kthare10 is this error popping up in Orchestrator at graph ingestion? Is it possible somehow the model is being modified by one thread while being validated by another? This code is not thread-safe and at least at first glance this is looking to me like a race condition. Simply validating this graph after loading it doesn't create any problems.
Yes, the error pops up as soon as Orchestrator is validating the incoming create slice request. No other thread is processing the slice at that point. This function is called directly from the Flask end point handler. I'll try to reproduce it and keep you posted.
def create_slice(self, *, token: str, slice_name: str, slice_graph: str, ssh_key: str,
lease_end_time: str) -> List[dict]:
if self.globals.is_maintenance_mode_on():
raise OrchestratorException(Constants.MAINTENANCE_MODE_ERROR)
slice_id = None
controller = None
new_slice_object = None
asm_graph = None
try:
end_time = self.__validate_lease_end_time(lease_end_time=lease_end_time)
controller = self.controller_state.get_management_actor()
self.logger.debug(f"create_slice invoked for Controller: {controller}")
# Validate the slice graph
topology = ExperimentTopology(graph_string=slice_graph)
topology.validate()
....
It looks the issue is because NetworkXGraphImporter
class has an inbuilt NetworkXGraphStorage
class which uses a singleton object __NetworkXGraphStorage
which is not thread safe.
Also, in the current CF implementation, when the topology object goes out of scope it is not explicitly deleting the graph from the NetworkXGraphStorage. So this Singleton instance on production must be growing pretty big.
It has resulted in some of the operations taking as long as 250 seconds.
class NetworkXGraphImporter(ABCGraphImporter):
"""
Importer for NetworkX graphs. Stores graphs in a single NetworkX
object. Care is taken to disambiguate nodes when loading graphs.
"""
READ_FORMATS = ["json_nodelink", "graphml"]
def __init__(self, *, logger=None):
"""
Initialize the importer setting up storage and logger
:param logger:
"""
self.storage = NetworkXGraphStorage(logger=logger)
class NetworkXGraphStorage:
def __init__(self, logger=None):
if not NetworkXGraphStorage.storage_instance:
NetworkXGraphStorage.storage_instance = NetworkXGraphStorage.__NetworkXGraphStorage(logger=logger)
class __NetworkXGraphStorage:
"""
Singleton in-memory storage for graphs
"""
def __init__(self, logger=None):
self.graphs = nx.Graph()
self.start_id = 1
self.log = logger
Topology validate fails with error: dictionary changed size during iteration
Snapshot of the stack trace: