InterNetNews / inn

INN (InterNetNews) Usenet server
https://www.isc.org/othersoftware/#INN
Other
66 stars 12 forks source link

Invalid argument with makehistory and 0 length articles #288

Closed nelgin closed 5 months ago

nelgin commented 6 months ago

Somehow, my innd setup got corrupted again so I'm rebuilding the index, however I have got a few "Invalid argument" errors.

news@www:~$ time makehistory -O -x -F makehistory: tradspool: could not mmap article /news/spool/articles/free/pt/5285: Invalid argument makehistory: tradspool: could not mmap article /news/spool/articles/ger/ct/4083: Invalid argument makehistory: tradspool: could not mmap article /news/spool/articles/sci/physics/relativity/1788: Invalid argument

When I look at these files, they are empty.

news@www:~$ ls -l /news/spool/articles/free/pt/5285 /news/spool/articles/ger/ct/4083 /news/spool/articles/sci/physics/relativity/1788 -rw-rw-r-- 1 news news 0 Nov 16 2021 /news/spool/articles/free/pt/5285 -rw-rw-r-- 1 news news 0 Nov 16 2021 /news/spool/articles/ger/ct/4083 -rw-rw-r-- 1 news news 0 Nov 16 2021 /news/spool/articles/sci/physics/relativity/1788

Should the procedure for makehistory include removing any 0 length or invalid articles first? Could makehistory be told to remove any invalid articles? and...I wonder why I've never seen this before considering those are 2 years old.

Can I just delete these and move on or would I have to rebuild the index again or...?

Julien-Elie commented 6 months ago

I can reassure you that you don't need rebuilding your history again. These empty files were just skipped by makehistory. Yes, you can just remove these empty files. There's unfortunately no way to recover their contents, and I don't know why they were created empty (or became empty).

As for your question about automatically removing these files, I am unsure it is a role that makehistory should have. I'm wondering whether adding an option to scanspool wouldn't be useful to achieve that for tradspool. For instance, running scanspool -r would remove articles reported in error. And at the same time, adding to scanspool the ability to report empty files as an error (this is not currently checked, and I believe it should, as it already reports articles consisting of only headers, without any body).

nelgin commented 6 months ago
Dec 18 21:59:04 www innd: tradspool: could not open /news/spool/articles/news/test/1: File exists
Dec 18 21:59:04 www innd: SERVER cant store article: File exists
Dec 18 21:59:04 www innd: tradspool: could not open /news/spool/articles/news/test/2: File exists
Dec 18 21:59:04 www innd: SERVER cant store article: File exists
innd: tradspool: could not open /news/spool/articles/news/test/1: File exists
innd: tradspool: could not open /news/spool/articles/news/test/2: File exists

I was getting these errors in my log. According to https://www.eyrie.org/~eagle/faqs/inn.html#S4.4 one method to resolve the issue is to run the general solution makehistory -O -x -F

At that time I didn't have the cycles to go through and deal with each group so I just let it run. It only takes about 45 minutes.

I didn't know about scanspool...now that brings up a whole new different set of questions!

Julien-Elie commented 6 months ago

Running scanspool will give you all the files that have a higher article number than they should have.

nelgin commented 6 months ago

I see a bunch of these

it/test/42534: article number is too low
it/test/23860: article number is too low
newsreader/test/114: article number is too low
newsreader/test/180: article number is too low
hamster/de/config/55

But not much on what to do about them.

Julien-Elie commented 6 months ago

They should have expired and been removed, but somewhat were not. These articles are no longer retrievable by news clients as not present in overview data for these groups, unless the client knows the exact Message-ID to search. Note that they may be crossposted articles which are still readable from another newsgroup, and normally another copy (hardlink) of these articles are on your disk.

Julien-Elie commented 6 months ago

On second thoughts, I am unsure scanspool (or makehistory) should implement an automatic removal of invalid articles. We should not wrongly remove articles; there may be a bug or an oversight in the detection rules that would lead to remove false positives. Besides, all articles may not be removable depending on the storage method.

Suggestions about the remarks made in this discussion:

Do not hesitate to tell if that sounds good to you, or if you see other improvements to do.

nelgin commented 6 months ago

They should have expired and been removed, but somewhat were not. These articles are no longer retrievable by news clients as not present in overview data for these groups, unless the client knows the exact Message-ID to search. Note that they may be crossposted articles which are still readable from another newsgroup, and normally another copy (hardlink) of these articles are on your disk.

Various parts of inn throws up errors that are not well documented, or not documented at all. If you know, you know, otherwise you're out of luck. For the new or occasional admin like myself it can make it difficult to know what to do, especially when your server is down (hence why I ended up posting here).

Julien-Elie commented 6 months ago

Various parts of inn throws up errors that are not well documented, or not documented at all. If you know, you know, otherwise you're out of luck. For the new or occasional admin like myself it can make it difficult to know what to do, especially when your server is down (hence why I ended up posting here).

Yes, I totally understand your point and you're not the only one to express that. I try to improve what I can when seeing questions or bug reports like you do.

As a side note, it is not that easy to respond to the needs of all kinds of users, from the very technical one wanting the exact error to the news admin just wanting to directly know what he can do to fix it, and even better if it couldn't be sort of auto-fixed. Same thing for the documentation: some people want all the details, but when they are put, other people complain that the documentation is too lengthy and too technical. I understand all these points but unfortunately, given the very low number of active contributors on the project, we cannot do everything (like writing two documentations, creating and maintaining diagrams, fixing bugs, investigating on issues, improving logs, etc.). We try to do our best :)

(INN is a pretty complex news server with about 150k lines of code, excluding comments and blank lines, and 120 manual pages at the time I'm writing in 2023.)

nelgin commented 6 months ago

120 manual pages is probably half the problem. Most people are not likely to know they're even there. scanspool is a good example. I know that a straight text faq is easy to send out via usenet, but maybe a wiki would be useful so documents can be split up and searched, and others can add useful stuff to it like procedures like:

Steps I need to take to add a new peer I need to change innfeed.conf, incoming.conf, newsfeeds - now what? How can I import articles from a service like Giganews without propagating them to my peers

etc. Right now information is scatted between man pages, FAQs, and various newsgroups.

Julien-Elie commented 6 months ago

120 manual pages is probably half the problem. Most people are not likely to know they're even there. scanspool is a good example. I know that a straight text faq is easy to send out via usenet, but maybe a wiki would be useful so documents can be split up and searched

That's a good point. I once had in mind to compile the manual pages into a PDF file so that it could be easily searched. It may be a starting point.

and others can add useful stuff to it

Note that contributions to existing manual pages and the FAQ are always welcome. Suggestions of additional wording, examples or procedures have always been possible, and are still possible. These contributions will be taken into account and quickly added to them.

As for wikis, I agree they are more interactive, and have a great utility. The fact is that, over the years, I've seen several of them (3 or 4). They were advertised a few times but nobody contributed except for their initial author... So wikis are not the panacea, nor the response to everything. For example, the last one created two years ago is http://www.dodin.org/wiki/pmwiki.php?n=Doc.ConfigurerINN-2021 and you can of course contribute to it. (Another one, usenet-fr.redatomik.org, in French, is no longer online...)

Steps I need to take to add a new peer I need to change innfeed.conf, incoming.conf, newsfeeds - now what?

What information exactly are you looking for? Is the meaning of your "now what?" question the one of the last subject of the FAQ "6.14. Find external feeds and set up peering"? It indicates the news.admin.peering newsgroup and references the manual pages of these 3 configuration files which have a working example to use at the beginning ("In a nutshell" section). See for instance https://www.eyrie.org/~eagle/software/inn/docs/innfeed.conf.html

I am unsure what you would have wanted. (Is it a dedicated web page in a wiki concatenating the 3 "In a nutshell" sections and subject 6.14 of the FAQ? Sort of the links above to wikis had. They just need to be revived and more widely known.)

How can I import articles from a service like Giganews without propagating them to my peers

Interesting question. Thanks for having mentioned it. This is not explicitly said in manual pages AFAIK. I've added the information in the pullnews manual page:

In case you have running peers and don't want to propagate them the articles you are pulling from upstream servers, you should add a fake hop with the -F flag to all the pulled articles, and add that very fake hop in the exclusion sub-field of all the sites configured in your newsfeeds file. (For example, using pullnews -F myserverimported, change sitename:*:Tm:innfeed! to sitename/myserverimported:*:Tm:innfeed! for every sitename in newsfeeds you don't want to feed the pulled articles to.)

I hope it is the sort of information you were looking for.