dmvss / group-project

Real-time VPN traffic classifier.
0 stars 0 forks source link

The part about data processing #1

Open 0111w21 opened 11 months ago

0111w21 commented 11 months ago

Hello, I would like to ask how to get the training_data.csv and merged_files2.csv files in the code. I don't seem to see how you deal with the VPN-nonVPN dataset (ISCXVPN2016) dataset in other files. I hope you can reply after you see it. Thank you!

dmvss commented 11 months ago

Hi. The files you're referring to were obtained by merging different parts of the dataset and the code does not cover this process. We took the parts of the dataset we thought were viable, merged them and labeled them. From there the interaction takes place only by referencing the trained model. Unfortunately this was done a while ago and I wasn't the one responsible so I can't really elaborate further.

Our group barely knew git and python at the time of doing this and so the code isn't well written or documented. Also, the accuracy of the classifier is terrible. Please take note of that.

Here's a file that you seem to be referring to in case you want to take a look but it might be incompatible. This is only a part of it since the original size was aprox. 30MB and I wasn't able to attach it.

merged_sample.csv