cpan-testers / cpantesters-project

A meta-project for tracking CPAN Testers project goals
6 stars 1 forks source link

Fixing the reports summaries #9

Closed preaction closed 8 years ago

preaction commented 8 years ago

The reports summaries were offline for quite a while. This affected the version summary bars on the reports website, and the release SQLite database (which is generated from the same data, but using a different process).

We've fixed it, but we should track it here to ensure everything's wrapped up.

Related issues:

preaction commented 8 years ago

Everything here seems cleared up, so I'm closing this. Writing a better API for the reports summaries is a different issue.

preaction commented 8 years ago

And because I said it, it broke again. Reopening.

preaction commented 8 years ago

This is generated by /var/www/reports/toolkit/reports-release.sh (which is called from cron) which calls /home/barbie/bin/process-controller.pl (not on CPAN that I can see) to manage a process using /var/www/reports/toolkit/reports-release.ini. This runs perl reports-release.pl --update from the /var/www/reports/toolkit directory, after killing all existing instances of reports-release.pl using killall. This is logged to /var/www/reports/toolkit/logs/release-run.log.

The log shows a bunch of errors from the database, but in HTML. So it must be using Labyrinth to do its work. The log also shows that it stopped processing new records around April 25 (2016/04/25 16:31:41 .. summary max=67995474, data max=67995474 which was the last increase, we are now at 73000000 records). This was during the QA Hackathon, so it doesn't seem to have worked since then...

These records are in the cpanstats.release_summary table. I need to find what updates this table.

preaction commented 8 years ago

perl reports-release.pl --update runs the Update sub from Labyrinth::Plugins::CPAN::Release. This pulls the summary max from SELECT MAX(id) FROM cpanstats.release_summary and the data max from SELECT MAX(id) FROM cpanstats.release_data. So, it seems the release_summary gets updated from release_data. So now what updates release_data?

preaction commented 8 years ago

The release_data table is created by /var/www/reports/toolkit/reports-release-create.sh which manages a process that eventually calls perl reports-release.pl --create. This needs to be done on a cron job in order to keep the release data in sync.

preaction commented 8 years ago

A new line is added to the barbie user's crontab which runs the /var/www/reports/toolkit/reports-release-create.sh about 90 minutes before /var/www/reports/toolkit/reports-release.sh, which should be enough time for the release to be created.

It is currently running, but it has 5 million records to get through this first time.

We'll give this a week and check on it again.

preaction commented 8 years ago

The script that moves data from cpanstats.release_summary to a SQLite database is /opt/projects/cpantesters/release/bin/release.pl which is executed by /opt/projects/cpantesters/autorun-back3.sh. The release.pl creates the SQLite database in /opt/projects/cpantesters/release/data/release.db and then the autorun-back3.sh copies the database to /opt/projects/cpantesters/db/release.db, then copies it twice more to /opt/projects/cpantesters/dbx/release.db to compress one with gzip and the other with bzip2. These compressed versions are then moved to /var/www/cpandevel/release which is available at http://devel.cpantesters.org/release/release.db.gz and http://devel.cpantesters.org/release/release.db.bz2.

I found an issue in the reports-release.ini and reports-release-create.ini that was preventing one of them from running while the other was running. They were both looking for just the string reports-release.pl in the ps aux output, which they both use (but with different flags). I added --update and --create to their search so they shouldn't find each other. This might be a bad thing to fix, considering the load on the server, but I fixed it anyway so I can see the data flowing to the release_summary table.

preaction commented 8 years ago

We've now reached 72,000,000 reports processed for the release data. I've checked the release.db and it contains updated reports. However, metacpan does not seem to have loaded the release: https://metacpan.org/release/DROLSKY/Params-CheckCompiler-0.07 definitely has reports in the release database, but the reports are not presently showing. Going to update them on the status and see if it's on their end.

From our end this looks fixed. Closing.