FortiGuard URL: taxonomy is too rigid

srilumpa commented 6 years ago

Request Type

Analyzer / Bug (Feature?)

Work Environment

Question	Answer
OS version (server)	Debian
OS version (client)	N/A
Cortex Analyzer Name	Fortiguard_URL
Cortex Analyzer Version	2.0
Cortex Version	2.0.4
Browser type & version	N/A

Description

When categorizing URL or domains, the taxonomy used against some categories does not show as suspicious or malicious when they seem they are

Steps to Reproduce

Analyze a domain or an URL categorized as Phishing by Fortiguard
The domain of URL will be taxonomized as "safe" by the analyzer instead of "suspicious" or "malicious"

Possible Solutions

Allow customization of taxonomy level by the Cortex organization admins instead of having it hard-coded in the analyzer.

saadkadhi commented 6 years ago

Thanks @srilumpa for raising this valid issue. @jeromeleonard or @3c7 will look into it when they have some time. However, we won't make the taxonomy adjustable before Cortex 2.2+. This feature is indeed important but it will need some serious work on our side.

jeromeleonard commented 6 years ago

the pb resides in the analyzer. Need to select Fortiguard Categories (https://fortiguard.com/webfilter/categories) and tell for everyone which is "suspicious", "malicious", "info" or "safe", and update the code with something like this:

        # https://fortiguard.com/webfilter/categories
        maliciousCat = [
            "Malicious Websites"
        ]
        suspiciousCat = [
            "Suspicious Websites",
            "Dynamic DNS",
            "Newly Observed Domain",
            "Newly Registered Domain",
            "Phishing",
            "Spam URLs"
        ]

        infoCat = [
            "Abortion",
            "Advocacy Organizations",
            "Alcohol",
            "Alternative Beliefs",
            "Dating",
            "Gambling",
            "Lingerie and Swimsuit",
            "Marijuana",
            "Nudity and Risque",
            "Other Adult Materials",
            "Pornography",
            "Sex Education",
            "Sports Hunting and War Games",
            "Tobacco",
            "Weapons (Sales)",
            "File Sharing and Storage",
            "Freeware and Software Downloads",
            "Internet Radio and TV",
            "Internet Telephony",
            "Peer-to-peer File Sharing",
            "Streaming Media and Download",
            "Armed Forces",
            "Business",
            "Charitable Organizations",
            "Finance and Banking",
            "General Organizations",
            "Government and Legal Organizations",
            "Information Technology",
            "Information and Computer Security",
            "Online Meeting",
            "Remote Access",
            "Search Engines and Portals",
            "Secure Websites",
            "Web Analytics",
            "Web Hosting",
            "Web-based Applications",
            "Advertising",
            "Arts and Culture",
            "Auction",
            "Brokerage and Trading",
            "Child Education",
            "Content Servers",
            "Digital Postcards",
            "Domain Parking",
            "Dynamic Content",
            "Education",
            "Entertainment",
            "Folklore",
            "Games",
            "Global Religion",
            "Health and Wellness",
            "Instant Messaging",
            "Job Search",
            "Meaningless Content",
            "Medicine",
            "News and Media",
            "Newsgroups and Message Boards",
            "Personal Privacy",
            "Personal Vehicles",
            "Personal Websites and Blogs",
            "Political Organizations",
            "Real Estate",
            "Reference",
            "Restaurant and Dining",
            "Shopping",
            "Social Networking",
            "Society and Lifestyles",
            "Sports",
            "Travel",
            "Web Chat",
            "Web-based Email",
            "Child Abuse",
            "Discrimination",
            "Drug Abuse",
            "Explicit Violence",
            "Extremist Groups",
            "Hacking",
            "Illegal or Unethical",
            "Plagiarism",
            "Not Rated"
        ]

        if 'category' in raw:
            r = raw.get('category')
            value = "{}".format(r)
            if r in maliciousCat:
                level = "malicious"
            elif r in suspiciousCat:
                level = "suspicious"
            elif r in infoCat:
                level = "info"
            else:
                level = "safe"

Maybe there are other categories.

srilumpa commented 6 years ago

I have submitted the PR #296 which implement the logic your are describing, but basing the malicious and suspicious categories on two multi-values fields in the configuration of the analyzer.

jeromeleonard commented 6 years ago

thank you @srilumpa, will look at it.

TheHive-Project / Cortex-Analyzers