Open michenriksen opened 2 years ago
I like the look of that, I think it needs a creation date or something like that. I called it last seen
before but that probably isn't right. I just wanted something that would could be used to indicate if any needed double checking if they hadn't been updated for a while, especially things like the big scanners, I can imagine someone like Nessus arbitrarily changing their UA on a point version change just because someone wanted to.
A tool to check the regex against the examples would be cool and a useful way to validate that regex worked. I don't know much about the GitHub PR checks, but I'm fairly sure it could be built into that.
Ah, right, I forgot about the last seen
value! Perhaps we could call it something like reviewed_at
to better communicate when the entry was last checked for correctness?
edit: updated the proposed schema to include a reviewed_at
value.
A tool to check the regex against the examples would be cool and a useful way to validate that regex worked. I don't know much about the GitHub PR checks, but I'm fairly sure it could be built into that.
Yes, automatic "unit testing" on PRs should definitely be relatively straight-forward to add. I can look into setting that up, unless you want to give it a go? :)
I'm quite happy for you to set things up. Want me to give you access to make it easier?
This is a proposal for a schema change which is substantial enough to be worth considering in the early phase:
Proposal:
match
: An RE2-compatible regular expression to match User Agents against. The RE2 engine is fast, ReDoS-safe, and is compatible with many languagesname
: Name of the toolurl
: Tool website or GitHub repository to get more informationexamples:
A list of actual User Agents from the tool. This keeps the value from the current schema of containing actual complete attack tool UAs and can be used in automated testing to ensure the regular expression actually matches expected UAsconfidence
: Some kind of indication of how confident someone can be that a match from this signature would be an actual attack tool request, and not a tool masking as a real browser UA. This would be useful for filtering purposesmasscan
it's very likely Masscan)Benefit
The main benefit of the proposed schema is the more flexible and future-proof matching of User Agents which avoids the current need for creating multiple entries for the same tool in order to accommodate different version numbers and URLs present in the UA. Exact User Agents are still captured in the
examples
list which is something that is unique to this project (as far as I have seen). Theurl
andconfidence
would make a match more actionable to an analyst as they would know where to get more information as well as how confident they can be in the finding.If this looks like a good idea, I will gladly help converting the current entries to the new format!