faizann24 / Fwaf-Machine-Learning-driven-Web-Application-Firewall

Machine learning driven web application firewall to detect malicious queries with high accuracy.
http://fsecurify.com/fwaf-machine-learning-driven-web-application-firewall/
416 stars 132 forks source link

Script not working? #8

Open thlevy opened 6 years ago

thlevy commented 6 years ago

Hi, Has the ML script been run recently? I tried to run it with python 3 and 2 and got errors. For inst with python 2 (might be 2.7.15), I got the following logs: _"(py27) C:\Users\Thomas Lev\Documents\Fwaf\Fwaf-Machine-Learning-driven-Web-Application-Firewall-master>python script.py C:\Users\Thomas Lev\AppData\Local\conda\conda\envs\py27\lib\site-packages\sklearn\cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20. "This module will be removed in 0.20.", DeprecationWarning) Traceback (most recent call last): File "script.py", line 43, in X = vectorizer.fit_transform(queries) File "C:\Users\Thomas Lev\AppData\Local\conda\conda\envs\py27\lib\site-packages\sklearn\feature_extraction\text.py", line 1381, in fit_transform X = super(TfidfVectorizer, self).fit_transform(raw_documents) File "C:\Users\Thomas Lev\AppData\Local\conda\conda\envs\py27\lib\site-packages\sklearn\feature_extraction\text.py", line 869, in fit_transform self.fixedvocabulary) File "C:\Users\Thomas Lev\AppData\Local\conda\conda\envs\py27\lib\site-packages\sklearn\feature_extraction\text.py", line 792, in _count_vocab for feature in analyze(doc): File "C:\Users\Thomas Lev\AppData\Local\conda\conda\envs\py27\lib\site-packages\sklearn\feature_extraction\text.py", line 255, in return lambda doc: self._char_ngrams(preprocess(self.decode(doc))) File "C:\Users\Thomas Lev\AppData\Local\conda\conda\envs\py27\lib\site-packages\sklearn\feature_extraction\text.py", line 116, in decode doc = doc.decode(self.encoding, self.decode_error) File "C:\Users\Thomas Lev\AppData\Local\conda\conda\envs\py27\lib\encodings\utf_8.py", line 16, in decode return codecs.utf_8decode(input, errors, True) UnicodeDecodeError: 'utf8' codec can't decode byte 0xc0 in position 1: invalid start byte"

Thanks

Thomas

Interfish commented 6 years ago

It seems it's a decoding error. Maybe try reclone this project and make sure your badQueries and goodQueries text file are encoded with UTF-8, or try to convert them. Try this : https://stackoverflow.com/questions/4182603/how-to-convert-a-string-to-utf-8-in-python

thlevy commented 6 years ago

Thanks. Indeed, a decoding error. Some chars in the text files appear not to be understood as UTF-8 chars. I managed to make it work by adding errors='ignore' in open function like this: with open(filepath,'r',errors='ignore') as f:

Thomas