InterNetNews / inn

INN (InterNetNews) Usenet server
https://www.isc.org/othersoftware/#INN
Other
68 stars 12 forks source link

Can scanspool be fixed to handle continuation lines in Newsgroups header fields? #289

Closed nelgin closed 6 months ago

nelgin commented 7 months ago

I'm running scanspool and saw this:

vmsnet/networks/tcp-ip/ucx/1: does not belong in vmsnet.networks.tcp-ip.ucx according to its Newsgroups header field

upon investigation, the vmsnet.networks.tcp-ip.ucx is there but it's on a new line and not part of the Newsgroups: line.

From: Nomen Nescio <nobody@dizum.com>
Subject: Famed US hacker Kevin Mitnick dies aged 59
Message-ID: <ba124916fb33abfc27c3ea4d200c91db@dizum.com>
Date: Fri, 21 Jul 2023 08:08:49 +0200 (CEST)
Newsgroups: alt.privacy.anon-server, alt.hacker, comp.os.vms,
 vmsnet.networks.tcp-ip.ucx

Could scanpool check the headers and check the next lines until it comes upon the next valid header or the article?

Maybe the proper thing to do would be to deny the article due to a malformed header, or fix it manually, which I am reluctant to do. I don't want to be known as the news admin who goes around altering articles, no matter what reason, that would destroy the integrity of usenet. They all seem to be coming from the same source at dizum.com.

nelgin commented 7 months ago

Oh, I just saw this in the man page

scanspool only considers the first line of the Newsgroups: header field. Continuation lines are not taken into account.

Maybe this could be considered an enhancement request.

Julien-Elie commented 7 months ago

Exactly, this is a known limitation. It could be considered either as an enhancement request, yes, or a bug of wrong parsing!

Julien-Elie commented 7 months ago

Incidentally, it is a duplicate of #193. (I'm closing the previous ticket I once opened, to only keep yours.)

Julien-Elie commented 7 months ago

I've just added a few features in scanspool, following our latest discussions: it now detects empty files in a tradspool news spool, directories with an all-digit component (which may conflict with a possible file with the same name), correctly parses continuation lines in header fields, and can automatically remove articles reported to have a problem (when run with the new -r flag). Before trying -r, please ensure the list of reported articles look good and there aren't false positives in it. If you find articles incorrectly reported, it is a bug, so please report them.

To test, just download the current version (https://raw.githubusercontent.com/InterNetNews/inn/main/frontends/scanspool.in) and update the first 2 lines of the script to use the same ones as your current scanspool program. You can then run it.