cpan-testers / cpantesters-api

An API in to data held by CPAN Testers: Test reports and CPAN uploads
Other
4 stars 4 forks source link

Improve performance of Metabase tail/log.txt #32

Open preaction opened 6 years ago

preaction commented 6 years ago

Right now, we generate the Metabase tail/log.txt on the backend every 10 minutes. The process takes 5-8 minutes, resulting in the data being slightly out of date. This isn't a huge problem, except that the Metabase is on two servers, and each server has its own version of the tail log. So, anyone trying to coordinate data could get different data every time.

Getting the list of reports takes mere seconds. Which means that the performance problem must be somewhere outside of the database.

It's possible to make the process faster in a couple ways. The biggest way would be to make finding the CPAN author of the distribution faster. This could involve fixing the CPAN::Testers::Schema::Result::TestReport relationship to the uploads table (right now it's not a relationship at all). Unfortunately, the test_report and uploads table cannot be easily joined since they have different character encodings (so any solution will have to address that). Another possibility would be to grab all the information from the uploads table in a single request (collect the list of dist/versions and execute one query to get the data and build a hash for lookups).

It would be good to profile this code to figure out what's slow before any performance improvements are made (and also to verify the efficacy of any performance improvements). The tail log can be generated by running perl bin/cpantesters-legacy-metabase eval 'app->refresh_tail_log'. Try using NYTProf to profile the code.