bgruszka / PHPAntiSpam

PHPAntiSpam - spam detection using bayesian analysis
MIT License
26 stars 5 forks source link

Report a bug #5

Closed nimasdj closed 3 years ago

nimasdj commented 7 years ago

Hello @bgruszka

For email text below and with ['category' => 'spam', 'content' => 'automated'], I am getting warning below for both Burton and Graham methods. please advice.

WARNING: Division by zero in bgruszka/phpantispam/src/Math.php on line 23

here is email text:

Hello,

This is an automated email to let you know, due to new year holiday, our support staff are working part-time, please be patient to receive an answer and bear with us! :)

Regards,

This e-mail message is confidential and may contain legally privileged information. If you are not the intended recipient you should not read, copy, distribute, disclose or otherwise use the information in this e-mail. Please delete the message from your system and notify us for delivery.

nimasdj commented 7 years ago

The same error for email below, with the keyword 'delivery'. Why I am getting this warning?

Lot No. 23 Patte D'Oie 03 BP 2147 Cotonou, Benin Republic.

FEDEX PACKAGE DELIVERY NOTIFICATION

Your package worth the total sum of $2,200,000 United State Dollar in an ATM CARD is here in our office.It was deposited to this office by the United Nations {HEAD CONSULTANT} saying that it was won by your email address in their ongoing online random selection Compensation scheme 2016 in association with the International Corporation Benin Republic OIL&GAS LTD.

What you have to do now is to contact our delivery department with your information and as soon as we confirm your details our delivery team will commence with the delivery of your package to your designated address immediately.

You would required to submit information bellow:

Your Full Name: Your Home address where the ATM CARD will be delivered: Your Phone Number: YOUR NEAREST AIRPORT: Your Country:

We shall proceed on your delivery as soon as we confirm your information, I also wish to inform you that you will be taking the responsible of the delivery charges being $75usd only.

NOTE THAT;,you are to contact our Head Dispatch Officer Mr.Frank Roberts with all this information to avoid wrong delivery thanks.

We Await your prompt positive response.


FEDEX Express ®Courier Company.

nimasdj commented 7 years ago

I did add a print debug in Math.php and this is output:

Array
(
    [this] => Array
        (
            [probability] => 0
            [usefulness] => 0.5
        )

    [is] => Array
        (
            [probability] => 0
            [usefulness] => 0.5
        )

    [delivery] => Array
        (
            [probability] => 1
            [usefulness] => 0.5
        )

    [home] => Array
        (
            [probability] => 0.5
            [usefulness] => 0
        )

    [where] => Array
        (
            [probability] => 0.5
            [usefulness] => 0
        )

    [name:] => Array
        (
            [probability] => 0.5
            [usefulness] => 0
        )

    [bellow:] => Array
        (
            [probability] => 0.5
            [usefulness] => 0
        )

    [be] => Array
        (
            [probability] => 0.5
            [usefulness] => 0
        )

    [full] => Array
        (
            [probability] => 0.5
            [usefulness] => 0
        )

    [phone] => Array
        (
            [probability] => 0.5
            [usefulness] => 0
        )

    [country:] => Array
        (
            [probability] => 0.5
            [usefulness] => 0
        )

    [shall] => Array
        (
            [probability] => 0.5
            [usefulness] => 0
        )

    [airport:] => Array
        (
            [probability] => 0.5
            [usefulness] => 0
        )

    [nearest] => Array
        (
            [probability] => 0.5
            [usefulness] => 0
        )

    [submit] => Array
        (
            [probability] => 0.5
            [usefulness] => 0
        )
)

So I guess you should have if clause within foreach to see if probability is greater than zero. I am going to do a pull request.

bgruszka commented 7 years ago

Hi,

Thanks for the report. Can you attach full script which you used? And can you give me info about PHP version you use?

I don't think so that lexeme will have in this case probability of 0 or this is right place to fix it, but I need to try to reproduce it :)

nimasdj commented 7 years ago

Hi,

Please use the first text as email text and the following: ['category' => 'spam', 'content' => 'automated'], ['category' => 'nospam', 'content' => ' this is no spam'],

For the second text as email text please use

['category' => 'spam', 'content' => 'delivery'], ['category' => 'nospam', 'content' => 'this is no spam'],

I use php 5.6. If you still cannot reproduce it, please letme know and I give full script.

nimasdj commented 7 years ago

@bgruszka I apologize, while typing I accidentally submitted it. I edited above. Please re-read it.

nimasdj commented 7 years ago

@bgruszka Could you replicate with instruction above?

bgruszka commented 7 years ago

Hi, yes thanks :) I made some fixed and released new version 0.2.1

nimasdj commented 7 years ago

Was the problem because of unicode? Which unicode was in my test? What was problem? I think even if we should asdume that probability should never be zero, this is still good to have if clause in Math class.