Rocionightwater / ML-NIDS-for-SCADA

In this work, we aim at developing a NIDS (Network Intrusion Detection System) that detects attacks targeting SCADA systems, in a concrete industrial used case scenario.
MIT License
65 stars 37 forks source link

For Specification #11

Closed sudo-thamaraikannan closed 3 years ago

sudo-thamaraikannan commented 3 years ago

Need preprocessed np data for specific classification. While Im trying to run the preprocess_data_lstm.py with respective args, im facing following issue , help me to getout of the issue.

payload_features = payload_features.view(np.float64).reshape(payload_features.shape + (-1,)) ValueError: When changing to a smaller dtype, its size must be a divisor of the size of original dtype

Rocionightwater commented 3 years ago

Hi Thamarai,

You have all the already preprocessed datasets in the folder results. There you can find all the datasets for SVM, Random Forest and LSTM for all the different missing values strategies* and for a splitting strategy of 60% training set, 20% validation set and 20% testing set.

Missing values strategies*: 1- Clustering - Gaussian Mixture Model (GMM) 2- Clustering - K-means 3- Zeros imputation & indicators technique 4- Keeping the closest preceding feature value that is existing or is non-missed

To run the code I leave here an example. Let's say you want to run the random_forest_hyperparameters.py program with the binary dataset (1 for malicious packets and 0 for benign) and with the missing values strategy "keep" (take the value of the previous observation in the dataset), mean and std deviation normalization strategy and the number of iterations to search for the hyperparameters set to 100. Then you just need to call on the command line: python src/random_forest_hyperparameters.py -d results/processed_data_RF/time-series-datasets/binary-ts-mean-keep -i 100

In blue is where the program is located and in pink where the dataset is located (you might adapt it depending on where you host the code and datasets files). Do not forget to change the output directory as well (otherwise the output of the program will be saved where the dataset is by default). Reminder: the datasets for Long Short Term Memory, SVM and Random Forests are all in the folder results (I used the dataset split configuration: 60% training set, 20% validation set and 20% test set).

Best regards, Rocio

On Fri, 20 Aug 2021 at 14:42, Thamarai Kannan @.***> wrote:

Need preprocessed np data for specific classification. While Im trying to run the preprocess_data_lstm.py with respective args, im facing following issue , help me to getout of the issue.

payload_features = payload_features.view(np.float64).reshape(payload_features.shape + (-1,)) ValueError: When changing to a smaller dtype, its size must be a divisor of the size of original dtype

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Rocionightwater/ML-NIDS-for-SCADA/issues/11, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABX6MZISQKQDE6RDJYOAPHTT5ZETJANCNFSM5CQH462A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

sudo-thamaraikannan commented 3 years ago

Thanks Rocio