Closed malvidin closed 4 years ago
Unfortunately this rule doesn't catch all SMTP sessions. I tested against the CDMC Spam dataset, and it only hit on 1977 of 4327 e-mails.
Is this data available from testing?
Sorry about that, apparently the official links are dead. I did however find them in this repo:
Do you want it to match all those files? Or only on valid SMTP data?
The lack of matching is due to the source data not being valid SMTP, as the line endings (LF instead of CRLF) do not conform to RFC 5321 2.3.8. The rule matches if the source data is corrected. Or the rule can be modified to match on messages with modified line endings.
I expected YARA to raise a warning on a hex string of { 0A 0A }
, but it did not in this test set. I expect that it could cause ERROR_TOO_MANY_MATCHES more frequently than { 0D 0A 0D 0A }
, but both might.
All of those files are legitimate SMTP sessions, but that doesn't mean they adhere to the RFC. I wouldn't put much stock in clients or servers adhering to the RFC. The yara rule should definitely still match payloads that are valid (albetit, not strictly RFC compliant) SMTP sessions.
If someone has a file that looks like an email but uses LF endings, I expect that they would want the dispatch rule to send the data to the SMTP plugin. I haven't seen any SMTP network traffic that uses LF endings, but I will look. Maybe something cleans them up, or they're rejected before I can see them.
Thanks, @malvidin !
The SMTP YARA dispatcher was broken due to an error on my part in string identifier naming. This corrects that error, and checks for the header values between the first of the file and the end of the header (the first empty line).