When RITA runs a summarize phase for a given host's IP beacons, proxy beacons, SNI beacons, or total durations, RITA will check to see if the max scoring peer has been a max scoring peer in the past. If it has, RITA will update the existing entry with the newest score.
If a particular peer is registered as a host's max scoring beacon in one import, and it comes up as the max scoring peer in the next import, but it has a lower score than before, this mechanism ensures that the lower score is recorded in the host collection.
This is important because it ensures that the score recorded in the host collection matches the score recorded in the beacon collection. However, there is a case where a max score from a previous import will not be updated on a subsequent import.
Consider a host with two peers: Alice and Bob. The following events occur:
Alice scores higher than Bob during the first import, so Alice's score is registered in the host collection
In the next import, Alice scores significantly lower, making Bob the highest scorer during this run. However, Bob's score is still less than Alice's in the first import. Bob's score will be pushed into the host collection as a result.
Now, the host collection entry has two max score entries: one for Alice during the first import and one for Bob in the second import. If you take the max of all these entries, you will come up with Alice's entry. However, Alice's host entry score doesn't match the record for the beacon between the host and Alice since the score went down during the second import.
I've attached a set of logs along with a script to reproduce the issue. Here are the IPs of concern:
The internal host to look at: 192.168.4.102
Alice/ the peer which had its beacon score go down: 167.248.49.102
Bob/ the peer which has the latest max beacon score: 198.137.202.56
Unzip the archive and run ./reproduce-bad-max-beacon.sh to generate a RITA dataset with this issue.
In this set of logs, the beacon score for 192.168.4.102 -> 167.248.49.102 (Alice) peaks at 100% during cid 4 but goes down to 50% during cid 5. The beacon score for 192.168.4.102 -> 198.137.202.56 (Bob) is marked as the maximum score during cid 5 with a score of 59.3%.
Unfortunately, there isn't any code in RITA to update the old max beacon record from chunk 4, so if you take the maximum of the max score records in the host collection for 192.168.4.102, you will retrieve the old and now incorrect record which points to Alice.
I see two ways which we may approach fixing this issue.
1.) Only keep a single max score dat subdocument for each module in each host's host collection dat array and update it using all of the available data during the summary phase. Currently, we write a new max score subdocument for each chunk, and we only update it using the data from the current chunk.
2.) Fetch the updated score for each max score subdocument in the host collection and update them using the other collections. This seems expensive in terms of compute time, so I'm hesitant to go down this path.
When RITA runs a summarize phase for a given host's IP beacons, proxy beacons, SNI beacons, or total durations, RITA will check to see if the max scoring peer has been a max scoring peer in the past. If it has, RITA will update the existing entry with the newest score.
If a particular peer is registered as a host's max scoring beacon in one import, and it comes up as the max scoring peer in the next import, but it has a lower score than before, this mechanism ensures that the lower score is recorded in the
host
collection.This is important because it ensures that the score recorded in the
host
collection matches the score recorded in thebeacon
collection. However, there is a case where a max score from a previous import will not be updated on a subsequent import.Consider a host with two peers: Alice and Bob. The following events occur:
host
collectionhost
collection as a result.Now, the
host
collection entry has two max score entries: one for Alice during the first import and one for Bob in the second import. If you take the max of all these entries, you will come up with Alice's entry. However, Alice's host entry score doesn't match the record for the beacon between the host and Alice since the score went down during the second import.I've attached a set of logs along with a script to reproduce the issue. Here are the IPs of concern:
192.168.4.102
167.248.49.102
198.137.202.56
test_files.zip
Unzip the archive and run
./reproduce-bad-max-beacon.sh
to generate a RITA dataset with this issue.In this set of logs, the beacon score for
192.168.4.102
->167.248.49.102
(Alice) peaks at 100% duringcid
4 but goes down to 50% duringcid
5. The beacon score for192.168.4.102
->198.137.202.56
(Bob) is marked as the maximum score duringcid
5 with a score of 59.3%.Unfortunately, there isn't any code in RITA to update the old max beacon record from chunk 4, so if you take the maximum of the max score records in the
host
collection for192.168.4.102
, you will retrieve the old and now incorrect record which points to Alice.I see two ways which we may approach fixing this issue. 1.) Only keep a single max score
dat
subdocument for each module in each host'shost
collectiondat
array and update it using all of the available data during the summary phase. Currently, we write a new max score subdocument for each chunk, and we only update it using the data from the current chunk. 2.) Fetch the updated score for each max score subdocument in the host collection and update them using the other collections. This seems expensive in terms of compute time, so I'm hesitant to go down this path.