digininja / scanner_user_agents

A list of user agents belonging to common web scanners.
GNU General Public License v3.0
38 stars 3 forks source link

Schema Proposal #5

Open michenriksen opened 2 years ago

michenriksen commented 2 years ago

This is a proposal for a schema change which is substantial enough to be worth considering in the early phase:

Proposal:

[
  {
    "match": "masscan(-ng)?\/",
    "name": "Masscan",
    "url": "https://github.com/robertdavidgraham/masscan"
    "examples": [
      "masscan/1.3",
      "masscan-ng/1.3 (https://github.com/bi-zone/masscan-ng)",
      "masscan/1.3 (https://github.com/robertdavidgraham/masscan)",
    ],
    "known_ips": [],
    "reviewed_at": "2022-06-29",
    "confidence": "high",
  }
]

Benefit

The main benefit of the proposed schema is the more flexible and future-proof matching of User Agents which avoids the current need for creating multiple entries for the same tool in order to accommodate different version numbers and URLs present in the UA. Exact User Agents are still captured in the examples list which is something that is unique to this project (as far as I have seen). The url and confidence would make a match more actionable to an analyst as they would know where to get more information as well as how confident they can be in the finding.

If this looks like a good idea, I will gladly help converting the current entries to the new format!

digininja commented 2 years ago

I like the look of that, I think it needs a creation date or something like that. I called it last seen before but that probably isn't right. I just wanted something that would could be used to indicate if any needed double checking if they hadn't been updated for a while, especially things like the big scanners, I can imagine someone like Nessus arbitrarily changing their UA on a point version change just because someone wanted to.

A tool to check the regex against the examples would be cool and a useful way to validate that regex worked. I don't know much about the GitHub PR checks, but I'm fairly sure it could be built into that.

michenriksen commented 2 years ago

Ah, right, I forgot about the last seen value! Perhaps we could call it something like reviewed_at to better communicate when the entry was last checked for correctness?

edit: updated the proposed schema to include a reviewed_at value.

michenriksen commented 2 years ago

A tool to check the regex against the examples would be cool and a useful way to validate that regex worked. I don't know much about the GitHub PR checks, but I'm fairly sure it could be built into that.

Yes, automatic "unit testing" on PRs should definitely be relatively straight-forward to add. I can look into setting that up, unless you want to give it a go? :)

digininja commented 2 years ago

I'm quite happy for you to set things up. Want me to give you access to make it easier?