cpan-testers / cpantesters-backend

Backend data processing for CPAN Testers
Other
0 stars 4 forks source link

Generate list of Perl versions from CPAN mirror and parsed reports #7

Open preaction opened 7 years ago

preaction commented 7 years ago

There is a perl_version table which caches the known list of Perl versions for easy reference. This table was maintained by the main report processing task (CPAN::Testers::Data::Generator). We need a way to build this table from scratch using data from the local CPAN mirror.

We should have one module, CPAN::Testers::Backend::ProcessPerlVersion. This module should be a runnable module (Beam::Runnable) that, when run, does the following:

This will require a DBIx::Class module be built to read/write the perl_version table (CPAN::Testers::Schema::Result::PerlVersion, in cpan-testers/cpantesters-schema). It could be easier to build the method that reads the CPAN directory in a CPAN::Testers::Schema::ResultSet::PerlVersion class (it's better design to push as much data/business logic into the model layer).

preaction commented 6 years ago

I've added the script that will populate Perl versions from the local CPAN mirror, but this is insufficient. Two things remain on this ticket:

The second task is necessary mostly because of patched Perls or other Perls that are not released on CPAN. For every report processed by ProcessReports, it should call the ensure_exists method of the PerlVersion resultset (see cpan-testers/cpantesters-schema#23).

preaction commented 6 years ago

We also need to fix the existing data: There are cpanstats lines without a corresponding row in the perl_versions table. We should add a FixPerlVersions command that will look for these rows and add them. The query to just find missing entries is:

select distinct cpanstats.perl 
from cpanstats 
left join perl_version on perl_version.version = cpanstats.perl
where cpanstats.perl != "0" and perl_version.version is null;

The "0" Perl version seems to be from some reporter back in the day. None of these records are more recent than 2010.