book / CPANio

CPAN.io
http://cpan.io/
7 stars 6 forks source link

Periodic page generation frequency and timestamps question #32

Closed perlancar closed 6 years ago

perlancar commented 9 years ago

I've seen the footer for the board pages several times. For example: "Page generated on 2015-05-10 at 01:14 UTC, based on all CPAN distributions released until 2015-05-09 at 12:24 UTC. Data from the CPAN Testers BackPAN indexes. " From what I can see (and correct me if I'm wrong), the page generated timestamp is usually much more recent and more frequently updated (hourly?) than the CPAN distributions released timestamp (only up to several times a day?). I'm wondering: if the CPAN distributions data has not changed, is it necessary to update the page?

book commented 9 years ago

Indeed, CPAN.io is a static web site generated once an hour by a cron job.

You're right that it's probably not necessary to update the page if the distribution data has not changed. What changes thought is the current time, which has some implications regarding the fact that an author is considered to have "missed" a period. And that can lead to inconsistent board when up-to-date distribution data is fetched.

So yes, I should probably not update the page if there are no new distributions since the previous run. There are a few other similar optimizations that can be done in the site (for example the pages under /ref/ are effectively regenerated every hour).

book commented 6 years ago

Given that the script runs on a mostly idle box, it's unlikely that I will make that optimisation.

book commented 6 years ago

a3178fc2a5c7aa31d9832974008d31e9b560dad3 generates 304 responses when the latest release is from before the latest dashboard generation. Combined with Wallflower's improved support for 304, the site generation should be faster, and pages should be re-generated only when the source data has changed.