KevinHoarau / BML

https://kevinhoarau.github.io/BML/
MIT License
14 stars 3 forks source link

Problems with feature transformation #7

Open YYM0093 opened 11 months ago

YYM0093 commented 11 months ago

Hi, I am having a little problem collecting and transforming data using BML, I have modified the code based on your example and it is able to collect the data successfully, but when I run the code for the graph feature transformation, it reports the following error and then the content in the .json file will be empty, here is a screenshot of my code and the error reported.

"#################

Data collection

folder = "single_data/TTNet/" dataset = Dataset(folder)

dataset.setParams({ "PrimingPeriod": 10*60, # 10 hours of priming data "IpVersion": [4], # only IPv4 routes "Collectors": ["rrc04","rrc05"], "UseRibsPriming": True })

dataset.setPeriodsOfInterests([ { "name": "TTNet", "label": "anomaly", "start_time": utils.getTimestamp(2004, 12, 24, 9, 20, 0) - 6030, "end_time": utils.getTimestamp(2004, 12, 24, 9, 20, 0) + 6030, }, { "name": "TTNet", "label": "no_anomaly_1", "start_time": utils.getTimestamp(2004, 12, 24, 9, 20, 0) - 6030 - 243600, "end_time": utils.getTimestamp(2004, 12, 24, 9, 20, 0) - 6030, }, { "name": "TTNet", "label": "no_anomaly_2", "start_time": utils.getTimestamp(2004, 12, 24, 9, 20, 0) + 6030, "end_time": utils.getTimestamp(2004, 12, 24, 9, 20, 0) + 6030 + 243600, }, ])

run the data collection

utils.runJobs(dataset.getJobs(), folder+"collect_jobs", nbProcess=3)

features extraction every 2 minute

datTran = DatasetTransformation(folder, "BML.transform", "GraphFeatures")

datTran.setParams({ "global":{ "Name": "WeightedGraphFeatures", "Period": 1, } })

run the data transformation

utils.runJobs(datTran.getJobs(), folder+"transform_jobs") "

微信截图_20231226211708

KevinHoarau commented 11 months ago

It may be due to a memory overflow issue. To limit the memory usage, you can limit the parallelization of the computation using the "nbProcess" parameter. However, the computation will become slower, so you can monitor the memory usage during the computation and increase the value accordingly.

Example of code using "nbProcess" :

datTran = DatasetTransformation(folder, "BML.transform", "GraphFeatures")

datTran.setParams({
  "global":{
    "Name": "WeightedGraphFeatures",
    "Period": 1,
    "nbProcess": 1
  }
})
YYM0093 commented 11 months ago

Thank you for your valuable advice, I tried your method normally but it was a bit slow, but that was definitely worth it to get the results I wanted!

YYM0093 commented 11 months ago

I'm sorry to bother you again, but I added the "nbProcess": 1 as you suggested, and after running the program overnight, the program reports the following error message (the program continues to run and is only 1/24th complete), which seems to be a multi-threading related error? Or is it a missing 'number_of_cliques' and 'node_clique_number'? :

Process Process-22:1315:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/opt/conda/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/jovyan/BML/BML/transform/graph.py", line 144, in runTransforms
    data[index] = self.transforms(index, G)
  File "/home/jovyan/BML/BML/transform/nodes_features.py", line 436, in transforms
    results = self.computeFeatures(G, features_nx, features_nk)
  File "/home/jovyan/BML/BML/transform/graph_features.py", line 187, in computeFeatures
    return(NodesFeatures.computeFeatures(self, G, features_nx, features_nk))
  File "/home/jovyan/BML/BML/transform/nodes_features.py", line 422, in computeFeatures
    results.update(computeFeaturesParallelized(features_nx, self.params["nbProcessFeatures"], self.logFiles, self.params["verbose"]))
  File "/home/jovyan/BML/BML/transform/nodes_features.py", line 322, in computeFeaturesParallelized
    r_copy[k] = results[k].copy()
  File "<string>", line 2, in __getitem__
  File "/opt/conda/lib/python3.9/multiprocessing/managers.py", line 825, in _callmethod
    raise convert_to_error(kind, result)
KeyError: 'number_of_cliques'
Process Process-22:1317:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/opt/conda/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/jovyan/BML/BML/transform/graph.py", line 144, in runTransforms
    data[index] = self.transforms(index, G)
  File "/home/jovyan/BML/BML/transform/nodes_features.py", line 436, in transforms
    results = self.computeFeatures(G, features_nx, features_nk)
  File "/home/jovyan/BML/BML/transform/graph_features.py", line 187, in computeFeatures
    return(NodesFeatures.computeFeatures(self, G, features_nx, features_nk))
  File "/home/jovyan/BML/BML/transform/nodes_features.py", line 422, in computeFeatures
    results.update(computeFeaturesParallelized(features_nx, self.params["nbProcessFeatures"], self.logFiles, self.params["verbose"]))
  File "/home/jovyan/BML/BML/transform/nodes_features.py", line 322, in computeFeaturesParallelized
    r_copy[k] = results[k].copy()
  File "<string>", line 2, in __getitem__
  File "/opt/conda/lib/python3.9/multiprocessing/managers.py", line 825, in _callmethod
    raise convert_to_error(kind, result)
KeyError: 'node_clique_number'
Process Process-22:1319:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/opt/conda/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/jovyan/BML/BML/transform/graph.py", line 144, in runTransforms
    data[index] = self.transforms(index, G)
  File "/home/jovyan/BML/BML/transform/nodes_features.py", line 436, in transforms
    results = self.computeFeatures(G, features_nx, features_nk)
  File "/home/jovyan/BML/BML/transform/graph_features.py", line 187, in computeFeatures
    return(NodesFeatures.computeFeatures(self, G, features_nx, features_nk))
  File "/home/jovyan/BML/BML/transform/nodes_features.py", line 422, in computeFeatures
    results.update(computeFeaturesParallelized(features_nx, self.params["nbProcessFeatures"], self.logFiles, self.params["verbose"]))
  File "/home/jovyan/BML/BML/transform/nodes_features.py", line 322, in computeFeaturesParallelized
    r_copy[k] = results[k].copy()
  File "<string>", line 2, in __getitem__
  File "/opt/conda/lib/python3.9/multiprocessing/managers.py", line 825, in _callmethod
    raise convert_to_error(kind, result)
KeyError: 'node_clique_number'
Process Process-22:1653:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/opt/conda/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/jovyan/BML/BML/transform/graph.py", line 144, in runTransforms
    data[index] = self.transforms(index, G)
  File "/home/jovyan/BML/BML/transform/nodes_features.py", line 436, in transforms
    results = self.computeFeatures(G, features_nx, features_nk)
  File "/home/jovyan/BML/BML/transform/graph_features.py", line 187, in computeFeatures
    return(NodesFeatures.computeFeatures(self, G, features_nx, features_nk))
  File "/home/jovyan/BML/BML/transform/nodes_features.py", line 422, in computeFeatures
    results.update(computeFeaturesParallelized(features_nx, self.params["nbProcessFeatures"], self.logFiles, self.params["verbose"]))
  File "/home/jovyan/BML/BML/transform/nodes_features.py", line 322, in computeFeaturesParallelized
    r_copy[k] = results[k].copy()
  File "<string>", line 2, in __getitem__
  File "/opt/conda/lib/python3.9/multiprocessing/managers.py", line 825, in _callmethod
    raise convert_to_error(kind, result)
KeyError: 'node_clique_number'
Process Process-22:1655:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/opt/conda/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/jovyan/BML/BML/transform/graph.py", line 144, in runTransforms
    data[index] = self.transforms(index, G)
  File "/home/jovyan/BML/BML/transform/nodes_features.py", line 436, in transforms
    results = self.computeFeatures(G, features_nx, features_nk)
  File "/home/jovyan/BML/BML/transform/graph_features.py", line 187, in computeFeatures
    return(NodesFeatures.computeFeatures(self, G, features_nx, features_nk))
  File "/home/jovyan/BML/BML/transform/nodes_features.py", line 422, in computeFeatures
    results.update(computeFeaturesParallelized(features_nx, self.params["nbProcessFeatures"], self.logFiles, self.params["verbose"]))
  File "/home/jovyan/BML/BML/transform/nodes_features.py", line 322, in computeFeaturesParallelized
    r_copy[k] = results[k].copy()
  File "<string>", line 2, in __getitem__
  File "/opt/conda/lib/python3.9/multiprocessing/managers.py", line 825, in _callmethod
    raise convert_to_error(kind, result)
KeyError: 'node_clique_number'
Process Process-22:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/opt/conda/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/jovyan/BML/BML/transform/dataset_transformation.py", line 31, in transformSample
    transform(transformation, primingFile, dataFile, params=params, outFolder=outputfolder, logFiles=logFiles)
  File "/home/jovyan/BML/BML/transform/base_transform.py", line 243, in transform
    transform.execute()
  File "/home/jovyan/BML/BML/transform/base_transform.py", line 174, in execute
    self.compute()
  File "/home/jovyan/BML/BML/transform/base_transform.py", line 215, in compute
    BaseTransform.compute(self)
  File "/home/jovyan/BML/BML/transform/base_transform.py", line 106, in compute
    self.computeSnapshot(t, routes, updatesParsed)
  File "/home/jovyan/BML/BML/transform/graph.py", line 126, in computeSnapshot
    self.pq.addProcess(target=self.runTransforms, args=(self.data, i, self.data[i]))
  File "<string>", line 2, in __getitem__
  File "/opt/conda/lib/python3.9/multiprocessing/managers.py", line 825, in _callmethod
    raise convert_to_error(kind, result)
KeyError: 827
KevinHoarau commented 11 months ago

Sorry for the delayed answer. In fact, nbProcess set the number of graphs that are processed in parallel. However, there is still parallelization for the computation of the features on a single graph. The nbProcessFeatures parameter can be used to fix that. But it shouldn't impact the memory usage... How much memory did you have ?

YYM0093 commented 10 months ago

Sorry I've only just seen your reply, after I switched to a computer with 60GiB of RAM, the problem I was having was solved and I've now collected my target dataset without any problems, thank you very much for your reply and for your BML!

YYM0093 commented 10 months ago

Hello, I have successfully collected the data I want and transformed it into statistical features, my aim is to replicate the model from multiple BGP anomaly detection literature, they collected the statistical features as shown in the figure, but the features I have converted using BML do not seem to contain all of them (e.g. Number of IGP packets, Number of EGP packets, Number of incomplete packets), and I checked the source code of BML and it doesn't seem to be collected either? Can you provide more details on the features collected by BML? I'm new to this area of BGP and may not understand the abbreviation of features in BML well, I apologize for that. 1