cybergreen-net / pm

Tech project management repo (issue tracker only)
2 stars 1 forks source link

aggregation: please count every IP only once within an interval #42

Closed aaronkaplan closed 8 years ago

aaronkaplan commented 8 years ago

When aggregating (let's say the time window is 1 week), please only count every IP address once for a given risk/ASN/time window.

This is not ideal, there are a number of mistakes (such as DHCP churn) here, but ... it is better than overshooting in our data when it comes to counting the same IP multiple times due to scanning intervals.

In other words:

SELECT 
   COUNT(distinct(ip)) as cnt,time_window as week,ASN 
FROM $table
WHERE ...
GROUP BY week,ASN
ORDER BY ...

Note: this is a Sept-30th fix. We will need to revisit that topic later on again.

aaronkaplan commented 8 years ago

so can this one be closed then as well?

rufuspollock commented 8 years ago

@zelima please close if you think this is fixed.

zelima commented 8 years ago

This is fixed see commit: https://github.com/cybergreen-net/aggregator/commit/36e93917388aa0e45a9554bad0a68ea454fc5e42 @aaronkaplan @rgrp I can not close issues here

rufuspollock commented 8 years ago

FIXED.