MichaelAquilina / SpamFilter

Classification of emails using machine learning and natural language processing techniques in Java
5 stars 4 forks source link

Possible Text and Parser Improvements #21

Closed MichaelAquilina closed 10 years ago

MichaelAquilina commented 10 years ago

Note: all examples given are from actual data extracted

Parser Improvements

Remove base64 encoded images (Fixed) Examples

Improve detection of mailto items (Fixed) Examples

Separate href, name, src etc... from actual value (Fixed) Examples Note that \u003 represents '=' in the data below

Detect formatted numbers (Fixed) Examples

Detect Date and Time Examples

Ignore file names / file paths Examples

xhochy commented 10 years ago

I think we have most of this stuff integrated. No more or all feature selections graphs will be screwed.