Closed hitenkoku closed 2 years ago
そもそもv1.4.2の段階で同じバイナリ、同じルール、同じディレクトリに複数回実行したときに検知回数が異なるという問題を確認しました。 Uniqueの検知回数は一致していますが、原因がよくわからない状態になっています。可能性があるとしたら時間内でのレコードの並びが実行時に一定になっていないことが原因かもしれません。 以下はv1.4.2の結果となります。v1.4.1は別途確認予定
1回目 (hayabusa.exe -d hayabusa-sample-evtx -o main1.csvとして実行) Results Summary:
Total events: 62168 Data reduction: 43137 events (69.39%)
Total detections: 31206 Total critical detections: 48 Total high detections: 6132 Total medium detections: 1283 Total low detections: 6666 Total informational detections: 17077
Unique detections: 519 Unique critical detections: 16 Unique high detections: 234 Unique medium detections: 141 Unique low detections: 75 Unique informational detections: 53
Date with most total critical detections: 2019-07-19 (14) Date with most total high detections: 2016-09-20 (3653) Date with most total medium detections: 2021-04-22 (146) Date with most total low detections: 2016-09-20 (3780) Date with most total informational detections: 2016-08-19 (2105)
2回目 (hayabusa.exe -d hayabusa-sample-evtx -o main2.csvとして実行) Results Summary:
Total events: 62168 Data reduction: 43139 events (69.39%)
Total detections: 31200 Total critical detections: 48 Total high detections: 6132 Total medium detections: 1282 Total low detections: 6662 Total informational detections: 17076
Unique detections: 519 Unique critical detections: 16 Unique high detections: 234 Unique medium detections: 141 Unique low detections: 75 Unique informational detections: 53
Date with most total critical detections: 2019-07-19 (14) Date with most total high detections: 2016-09-20 (3653) Date with most total medium detections: 2021-04-22 (146) Date with most total low detections: 2016-09-20 (3780) Date with most total informational detections: 2016-08-19 (2105)
v 1.4.1の結果は以下の通り。3回実行して検知数が変わらなかったためv1.4.2.での対応が原因ということが分かった
Results Summary:
Total events: 62168 Data reduction: 43137 events (69.39%)
Total detections: 31212 Total critical detections: 48 Total high detections: 6132 Total medium detections: 1284 Total low detections: 6667 Total informational detections: 17081
Unique detections: 519 Unique critical detections: 16 Unique high detections: 234 Unique medium detections: 141 Unique low detections: 75 Unique informational detections: 53
Date with most total critical detections: 2019-07-19 (14) Date with most total high detections: 2016-09-20 (3653) Date with most total medium detections: 2021-04-22 (146) Date with most total low detections: 2016-09-20 (3780) Date with most total informational detections: 2016-08-19 (2105)
- 2回目
Results Summary:
Total events: 62168 Data reduction: 43137 events (69.39%)
Total detections: 31212 Total critical detections: 48 Total high detections: 6132 Total medium detections: 1284 Total low detections: 6667 Total informational detections: 17081
Unique detections: 519 Unique critical detections: 16 Unique high detections: 234 Unique medium detections: 141 Unique low detections: 75 Unique informational detections: 53
Date with most total critical detections: 2019-07-19 (14) Date with most total high detections: 2016-09-20 (3653) Date with most total medium detections: 2021-04-22 (146) Date with most total low detections: 2016-09-20 (3780) Date with most total informational detections: 2016-08-19 (2105)
- 3回目
Total events: 62168 Data reduction: 43137 events (69.39%)
Total detections: 31212 Total critical detections: 48 Total high detections: 6132 Total medium detections: 1284 Total low detections: 6667 Total informational detections: 17081
Unique detections: 519 Unique critical detections: 16 Unique high detections: 234 Unique medium detections: 141 Unique low detections: 75 Unique informational detections: 53
Date with most total critical detections: 2019-07-19 (14) Date with most total high detections: 2016-09-20 (3653) Date with most total medium detections: 2021-04-22 (146) Date with most total low detections: 2016-09-20 (3780) Date with most total informational detections: 2016-08-19 (2105)
@YamatoSecurity 確認しました。v1.4.1の内容から検知数に関連する内容は #479 のみであることが分かりました。 おそらくDashMapにおける同一秒数に対して、データの投入の順序の誤差が発生してたためおそらくaggregation関連のルールの検知が発生したり、しなかったりしたことで検知件数にぶれが生じたのではないかと思われます。
改善:
- --update-rules オプションを利用する時に、更新対象のレポジトリを--rulesオプションで指定できるようにした。 (https://github.com/Yamato-Security/hayabusa/issues/615) (@hitenkoku)
- 並列処理の改善による高速化。 (https://github.com/Yamato-Security/hayabusa/issues/479) (@kazuminn)
- --outputオプションを利用したときのRulePathをRuleFileに変更した。RuleFileは出力するファイルの容量を低減させるためにファイル名のみを出力するようにした。 (https://github.com/Yamato-Security/hayabusa/issues/623) (@hitenkoku)
🐛 バグ修正:
- cargo runコマンドでhayabusaを実行するとconfigフォルダの読み込みエラーが発生する問題を修正した。 (https://github.com/Yamato-Security/hayabusa/issues/618) (@hitenkoku)
一旦実装を確認し、対応策を考えます
なるほど。バグの原因調査ありがとうございます!
こちらでも確認できました。3回連続で実行したら、3回目で検知数が2件減りました。 以下のイベントが検知されませんでした:
2016-08-18 23:49:40.000 +09:00,IE10Win7,Sec,4624,info,,4764,Logon (Type 2 Interactive) *Credentials stored in memory*,User: IEUser | Comp: IE10WIN7 | IP-Addr: 127.0.0.1 | LID: 0x1b2fd,4624_LogonType-2-Interactive.yml,../hayabusa-sample-evtx/DeepBlueCLI/many-events-security.evtx
2016-08-19 00:29:29.859 +09:00,IE10Win7,Sec,4672,info,,4806,Admin Logon,User: IEUser | PrivList: SeSecurityPrivilege SeTakeOwnershipPrivilege SeLoadDriverPrivilege SeBackupPrivilege SeRestorePrivilege SeDebugPrivilege SeSystemEnvironmentPrivilege SeImpersonatePrivilege | LID: 0x1aae1,4672_AdminLogon.yml,../hayabusa-sample-evtx/DeepBlueCLI/many-events-security.evtx
1.4.1の結果と比較したら、大体毎回数件のイベントを見逃しています。検証の時に気づかなくて申し訳ありません。
ありがとうございます。私の方ではどのルールが検出されなかったのかまでは調べていなかったので助かります。 ご指摘いただいた内容のルールを見てみたところ特にcountなどのaggregationconditionが利用されているようではないみたいですね……。 提示してもらったルールでconditionで使われているのはandとnotくらいでした。その場合であれば1レコードで判定ができるものなので、なぜ漏れたかがよくわからないですね。 もう少し確認してみます
そうですね。10件も減っている場合もあるので、countは関係なさそうです。何かのrace conditionで特定しにくいかもしれません。
本件、わたくしの追加した処理が原因のようで、申し訳ございません。
原因がわかりました。何らかの原因で、タイムラインsortする処理がv1.4.1本体に取り込まれていなかったのが原因のようです。 ↓ソートする処理。 https://github.com/Yamato-Security/hayabusa/pull/622/files#diff-e6c361eb7b576e57173f4e9bce9f7b5c7d68031a3335154fe046417a4cae0fc9R246
なぜ、本体にちゃんと取り込まれていないかは、わたくしの方で把握できていない状態です。
@kazuminn 原因特定ありがとうございます!ドンマイ! では、タイムラインsortを直せば、無事に全部検知できそうですか?
@YamatoSecurity 理論上はいける気がします。
@YamatoSecurity あー。1.4.2なんですね。見るバージョンを間違えていて、入っていました。
この話は忘れてください。
https://github.com/Yamato-Security/hayabusa/issues/639#issuecomment-1205064571
ご指摘いただいた個所はタイムラインをソートする処理自体は組み込まれていると思われます。(指摘されたところはv1.4.2で入っている。v1.4.1ではそもそも入っていない) そもそも該当の処理自体は既に検知結果が入っている状態であり、実行時に検知回数が異なるところと無関係のように思われます。
はい、1.4.1まではOKで、1.4.2以上ではランダムにイベントの検知漏れがあります。
@YamatoSecurity 私の開発経験が短いなりの提案ですが、並列化のバグということもあり、修正まで時間がかかりそうなので、一旦revertっぽく元に戻して、修正版をリリースするのはいかがでしょうか? その方が品質が保たれて良さそうです。
@kazuminn OKです。良いと思います。revertの修正でお手数をかけてすみません!
@kazuminn pull-requestからrevertをかけようとしましたが、他修正が入っているため、Githubのpull-requestページからでは対応ができないようでした。 ブランチを切って手作業で修正してください。
I have same issue too.
IMHO, because the operation ↓ is not atomic, it's seem to cause a race condition. https://github.com/Yamato-Security/hayabusa/blob/v1.4.3/src/detections/message.rs#L115-L123
For example, if the threads are executed in the following order, detect_info will be lost?
I created a unit test as follow, https://github.com/fukusuket/hayabusa/commit/09a146666c12ee91a1de07cb0d155ae2f80f9be4 then the test failed 1-2 times out of 10 on my Win10(8Core/16GB RAM) and M1 Mac(8Core/8GB RAM).
If you have already investigated(or my misunderstanding), please ignore this comment🚀
@fukusuket Thank you so much for looking into this for us and for the unit test! I thought it was due to a race condition due to the parallel processing but we didn't know for sure. Do you have any ideas of how to fix it? If we can't fix it soon, we plan on reverting back to how we did things before before the next 1.5.0 release so that we don't miss any events and then add back the parallel processing when it is stable.
Thank you for reply😀
Do you have any ideas of how to fix it?
I tried to fix it as follows. By using a Mutex lock(similar to version 1.4.1), prevent the race condition.
And I just tested it with the following and it seems to work fine.
./hayabusa -d hayabusa-sample-evtx -o out.csv
I've only briefly tested it so far, so I'd like to send a pull request when testing is complete.
Thank you so much! That helps us out a bunch! Please send us a pull request when you are ready. I will be happy to help test it out.
@fukusuket Thank you for your comment and test case. I will be happy to help test it.
@YamatoSecurity The problem in this case was to speed up the process by excluding Mutex. I would suggest checking to see if the time has changed significantly since v1.4.1, just to be sure.
今回のfukusuketの修正ですが、この状態になると無理にDashMapを使う理由があまりないかもしれません。(MutexでロックしているのでDashMapを使うことで想定している複数同時処理が行われない)
fukusuketが書いてもらったtestは非常に助かるので、以下のような方針で対応するのはいかがでしょうか。
@hitenkoku Thank you for review!
今回のfukusuketの修正ですが、この状態になると無理にDashMapを使う理由があまりないかもしれません。
That’s true,
If I can change the implementation(Dashmap to hashbrown::HashMap) and there seems to be no problem, I will send a pull request with the hashbrown::HashMap version?
Which implementation(Dashmap or HashMap) is better for fix this issue?
@fukusuket
Thank you for your comment.
I think that hashbrown::HashMap is better rather than Dashmap. DashMap is slow in this case Because DashMap do an unnecessary lock to process parallel.
👍 I will try the implementation with hashbrown::HashMap.
I think a BtreeMap is better than a HashMap. With BtreeMap, you can also parallelize the alignment of data. Even with a HashMap, it is possible if you align it every time you insert, but you will not know which one is faster until you experiment with it.
ちょっと根本的なDashMapの原因について、わからなかったので、Rust-jpの方で質問しましたが、わかったので、DashMapいけそうなので、ちょっと実装してみて、確認してみようと思います。 https://rust-lang-jp.zulipchat.com/#narrow/stream/124300-questions/topic/DashMap.E3.81.AE.E6.80.AA.E3.81.97.E3.81.84.E6.8C.99.E5.8B.95/near/293088218
Thank you for comment ^^
I test above entry(event_time).or_default();
impl, then race condition reproduction unit test never failed.(Even if I run it more than 30 times).
@fukusuket san. It's a good feeling. If you like, would you like to write the PR? We'll support you.
👍 I will try the implementation with entry(event_time).or_default()
, after testing I'll send PR gladly.
Sorry, entry(event_time).or_default()
impl still cause race condition😵
(In my dev env, race condition reproduction unit test from IDE never fails, but when I run cargo test
from terminal then test failed randomly.)
pub fn insert_message(detect_info: DetectInfo, event_time: DateTime<Utc>) {
let mut v = MESSAGES.entry(event_time).or_default();
let (_, info) = v.pair_mut();
info.push(detect_info);
I think vector initialization seems to be thread safe, but info.push(detect_info)
still not thread safe?
Maybe race condition of other test that use insert_message.
Could you please share the logs of the cargo test when the test fails?
@fukusuket
info.push(detect_info)
still not thread safe?
That part should be okay. Entry::or_default
returns a RefMut
, which contains a RwLockWriteGuard<'a, HashMap<K, V, S>>
(source code). This means that the entry will be write-locked while the RefMut
is alive. info.push(detect_info)
is called before RefMut
is dropped, so the entry is write-locked, allowing only one thread to modify the entry.
Try running cargo test
like the followings:
cargo test -- --test-threads=1
Your test case modifies the DashMap
instance bound to the static variable MESSAGES
. By default, Cargo runs multiple tests in parallel, so if other tests also modify the same DashMap
instance, these test cases will fail. Adding --test-threads=1
(doc) forces Cargo to run only one test case at a time, so only one test case will modify the DashMap
instance at a time. Note that it only controls how many test cases to run in parallel, so your test case for race condition reproduction should still spawn multiple threads.
Thank you for supporting. I didn't know cargo tests in parallel 😅
cargo test -- --test-threads=1
failed (maybe another problem) ...
fukusuke@fukusukenoAir hayabusa % cargo test -- --test-threads=1
Finished test [unoptimized + debuginfo] target(s) in 0.17s
Running unittests src/lib.rs (target/debug/deps/hayabusa-1fb759f713b8a86e)
running 205 tests
test afterfact::tests::test_emit_csv_display ... error: Found argument '--test-threads' which wasn't expected, or isn't valid in this context
If you tried to supply `--test-threads` as a value rather than a flag, use `-- --test-threads`
USAGE:
hayabusa.exe <INPUT> [OTHER-ACTIONS] [OPTIONS]
For more information try --help
error: test failed, to rerun pass '--lib'
But, By running only the target test, the race condition no longer occurred😁
fukusuke@fukusukenoAir hayabusa % cargo test --package hayabusa --lib detections::message::tests::test_insert_message_race_condition -- --exact
Finished test [unoptimized + debuginfo] target(s) in 0.30s
Running unittests src/lib.rs (target/debug/deps/hayabusa-1fb759f713b8a86e)
running 1 test
test detections::message::tests::test_insert_message_race_condition ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 204 filtered out; finished in 0.05s
Run the following command 5 times and confirm that the number of detections is the same, so I will create a PR.
hayabusa -d ../hayabusa-sample-evtx -o ...
Attach the command execution results. test-results.zip
Thank you everyone for your kind support🙇
I created a PR. I would appreciate it if you could test and review🙏
@fukusuket Thanks so much! I tested it and everything is working stable and the same results as 1.4.1.
PR:
Results Summary:
Total events: 5,232,442
Data reduction: 4,141,338 events (79.15%)
Total detections: 2,273,599
Total critical detections: 66 (0.00%)
Total high detections: 431,242 (18.97%)
Total medium detections: 184,057 (8.10%)
Total low detections: 128,959 (5.67%)
Total informational detections: 1,529,275 (67.26%)
Unique detections: 349
Unique critical detections: 10 (2.87%)
Unique high detections: 91 (26.07%)
Unique medium detections: 112 (32.09%)
Unique low detections: 77 (22.06%)
Unique informational detections: 59 (16.91%)
Elapsed Time: 00:16:25.306
Hayabusa 1.4.1:
Results Summary:
Total events: 5232442
Data reduction: 4140245 events (79.13%)
Total detections: 2273599
Total critical detections: 66
Total high detections: 431242
Total medium detections: 184057
Total low detections: 128959
Total informational detections: 1529275
Unique detections: 349
Unique critical detections: 10
Unique high detections: 91
Unique medium detections: 112
Unique low detections: 77
Unique informational detections: 59
Elapsed Time: 00:17:10.145
Note: the Data Reduction events total is different because of updated calculation.
@kazuminn @hitenkoku If the PR looks good to you two as well, lets merge it.
@YamatoSecurity I checked. LGTM,
Describe the bug
Different detection results for each run to same rule and directory
Step to Reproduce Steps to reproduce the behavior:
hayabusa-1.4.2-win-x64.exe -d ../hayabusa-sample-evtx -o 1.csv
hayabusa-1.4.2-win-x64.exe -d ../hayabusa-sample-evtx -o 2.csv
Expected behavior
Same detection results count to same rule and directory.
Screenshots If applicable, add screenshots to help explain your problem.
Environment (please complete the following information):
Additional context Add any other context about the problem here.