Open Julien-Elie opened 11 months ago
Speaking about the article number limit, an evolution of the current timecaf method could be the addition of storage.conf options to parameterize it. For instance with maxart
and maxtime
options to specify the number of articles per CAF file and the number of seconds before creating a new CAF file (which could for instance be a multiple of 256 seconds so as to keep the current file naming scheme). If maxtime
is set to 0, a new file is created when maxart
articles have been received.
The CAF file could also be dynamically extended if maxtime
is strictly positive, so as to keep receiving articles.
The CAF file header contains a _timet field (LastCleaned). It should be switched to _uint64t so as to be independent from the size of _timet which may vary on 32-bit architectures (i386) where _timet could be 4 or 8 bytes long depending on compilation options or the kernel used. As a 64-bit _timet transition is likely to occur on 32-bit architectures so as to avoid the Y2038 issue, INN should be resilient to that change.
A tool converting the CAF files appropriately should be provided and run by innupgrade if it detects a CAF file header in an old format.
At the same time, the _sizet and _offt variables in the CAF file header could also be changed to _uint64t (see #82).
As space is allotted for a maximum of 262144 (2^18) articles in current CAF files, people complain with that limit from time to time (when they inject at high speed articles in their news spool, more than 262144 articles can arrive in a time frame of 4 minutes, which leads to storage failures). While we're at changing the CAF file header, we could increase the allotted space to handle for instance 4,194,304 (2^22) articles. Handling 64-bit _timet would then be an opportunity to improve timecaf 🙂
Note that articles are stored in files named
<patharticles>/timecaf-nn/bb/aacc.CF
where the arrival time is read as0xaabbccdd
only on 4 bytes (using htonl on a _timet value). The current code seems to work fine until 2106 (and should be tested to make sure - by adding a test in the test suite), which gives enough time to eventually change the naming convention. Storage tokens should be changed at the same time. (A similar change should also be carried for the timehash storage method). Let's focus for the time being on the CAF header.