InterNetNews / inn

INN (InterNetNews) Usenet server
https://www.isc.org/othersoftware/#INN
Other
72 stars 13 forks source link

Better grouping by domain in readership statistics by innreport #305

Closed Julien-Elie closed 4 months ago

Julien-Elie commented 4 months ago

For better readability and also domain stats, the way innreport gathers domains should be improved. The current code only removes the first component of the domain, which is not enough in some cases like ec2-13-236-155-204.ap-southeast-2.compute.amazonaws.com and ec2-18-184-57-186.eu-central-1.compute.amazonaws.com which would end up in several lines gathered by AWS datacenters whereas we would just want a report of AWS connections from *.compute.amazonaws.com. Same thing for other cases. Below a current example:

NNRP readership statistics (by domain):
System                         Conn   Arts      Size Groups Post  Rej   Elapsed
*.akk.kit.edu                     5     23  291.8 KB      6    0    0  00:04:40
?                                 3      9   12.2 KB      3    0    0  00:00:16
unresolved                       16      6    8.1 KB      2    0    0  00:17:51
*.ipv6.abo.wanadoo.fr            26      0    0.0 KB      0    0    0  14:41:09
*.hsd1.ca.comcast.net            18      0    0.0 KB      0    0    0  04:55:12
*.monitoring.internet-measurem    2      0    0.0 KB      0    0    0  00:00:03
*.w90-3.abo.wanadoo.fr            2      0    0.0 KB      0    0    0  01:03:01
*.210.203.35.bc.googleusercont    1      0    0.0 KB      0    0    0  00:00:00
*.211.203.35.bc.googleusercont    1      0    0.0 KB      0    0    0  00:00:00
*.8.41.38.static.mds-telecom.n    1      0    0.0 KB      0    0    0  00:00:02
*.sfr.lns.abo.bbox.fr             1      0    0.0 KB      0    0    0  00:00:01
*.static.grandenetworks.net       1      0    0.0 KB      0    0    0  00:00:10
*.stretchoid.com                  1      0    0.0 KB      0    0    0  00:00:09

To improve the report, we'll just keep up to the last 3 components of the hostname. (Some country code top-level domains have 2 components, like .co.uk so we need a minimum of 3 components, which looks fine during my testings.)

The unknown domain ? corresponds to an unresolved IPv6 address. I see that innreport currently has code only for IPv4 addresses. It should also be fixed while improving these statistics.