dsnslab / NetworkSecurity

6 stars 1 forks source link

How to upload the file to data visualizer with exceeding 100 MB size? #33

Closed rreexxllii0310 closed 3 years ago

rreexxllii0310 commented 3 years ago

I have referred to closed issue #24 But,I have problem, How to upload the file packetbeat.json to data visualizer with exceeding 100 MB size? I had google it, but still not found solution.(or maybe no solution?) Or, I need to use elasticdump to upload file with large size?

thanks.

VaTanJo commented 3 years ago

maybe you can use elasticdump to uploads logs and use the option parameter limit actually, the size of logs is very large, so I recommend you use elasticdump

jehoshuapratama commented 3 years ago

@rreexxllii0310 You can split those files using the split command from command prompt. And upload them one by one. There are options that you can choose, split based on the file size or based on the number of lines and many more options.

ben4562002 commented 3 years ago

The limit of the file size can be adjust in "Management" -> "Stack Management" -> "Kibana" ->"Advanced Setting." You can change "100MB" to "1GB." image

VaTanJo commented 3 years ago

@ben4562002

Hello, BEN Can i ask you a question Because I am using ELK given by PJ1 So there is no machine learning function in KIBANA Did you use this feature? If so, how effective is it? Thanks for your sharing

rreexxllii0310 commented 3 years ago

Thank you everyone, you guys are so kind!

nianchengz commented 3 years ago

@VaTanJo I have try to use random forest to train the correct rate for each log is 76% I'm not sure if it is high or low maybe the number of log is not big enough to get very high correct rate

nianchengz commented 3 years ago

@VaTanJo
Besides, I didn't use the function in Kibana. I use python to extract the features in every log, and use python package to train.

VaTanJo commented 3 years ago

@Hooje thank you! Now I'm trying to use sklearn to train my model QQ but this is my first exposure to machine learning Can I ask you how to deal with the missing value? because the content of logs almost are string I need to deal with this before proceeding to the next step ( one hot encoder )

thank you!! you are so nice!!!

rreexxllii0310 commented 3 years ago

I have a problem, again. I extract some features from winlog and packetlog, respectively. Export them to two files, and store them in two dataframes. But, how can I train these two dataframes in one model? I tried to merge them, but the timestamp didn't match, so I got some NaN values. If I don't merge them, I have no idea how to train. One method is merge them and train, another is train two model respectively.

I have stuck here for a long time, pls give me some guides, thanks.

jehoshuapratama commented 3 years ago

Sorry, I am using rule-based method. I am afraid I could not help you with that as I might lead you to a wrong path... sorry

nianchengz commented 3 years ago

@VaTanJo
First question, deal with the missing value. I use the panda.read.csv function to open the .csv file, and there is a function call "fillna()". You can say colume['duration']=colume['duration'].fillna(10) if there is missing value in the colume of duration, it will be 10. and you can decide which number to use, or use the mean of the colume, like colume['time']=colume['time].fillna(colume['time'].mean()) I think the key word to google will be "pandas readcsv missing value" and "pandas read csv mean' and there may be some warning about pandas, you can google "python ignore warnings" , it will teach you to use "import warnings" package.

nianchengz commented 3 years ago

@VaTanJo Besides, i have change my json file to csv file. choose the feature you want to use and write into the csv file, just like what you see in the 'excel' you can google "import csv" "python csv writer" "python csv writerow" and it's easy to transform json to csv

second question is about string, you can use the function in 'pandas' call 'map', devide every value to a number. if there are too many kind of value in the colume, i think you can just skip that feature.

by the way, it is also my first time to use machine learning XDD, spending so many time to practice, so I just solve the problems you encounter

VaTanJo commented 3 years ago

@Hooje Thank you!!!! Your suggestion help me a lot Actually I stuck in second question about handling string ( failed to do One Hot Encoder ) for 2 days I think you are right, I have to reduce the number of features and use map, then do one hot encoder again

I have a group meeting on Monday, and there is not enough time for me to deal with this QQQQQ. So I gave up the ML method first, and use if else condition to complete Because I used ELK to do log analysis before, I know the corresponding attacks under which circumstances. But I will try to use ML to complete it!! Maybe after all the final exam, other projects....XDDDD

Thank you very much!!

VaTanJo commented 3 years ago

@rreexxllii0310 Maybe you can train the model by network logs first? TAs said that not require both types of LOG to be used

eggggbert commented 3 years ago

Here is a detailed document about elasticdump for reference ~

dsnslab commented 3 years ago

Closed as no further discussion