anhnongdan / cBIMAX

Official source code for cBIMAX - the analytics system for CDN
0 stars 0 forks source link

Calculate Pageview(Hit) vs Actions #25

Closed anhnongdan closed 7 years ago

anhnongdan commented 7 years ago

=> Pageviews is an action when Action type is an URL

-> CDN shows no downloads, no outlink, etc..

anhnongdan commented 7 years ago

Verify actions metrics on #32

MariaDB [pw1]> select count(*) from piwik_log_action where type!=1 and type!=4;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
anhnongdan commented 7 years ago

In Actions plugins: Pageview -> NB_PAGE_HIT:

PiwikMetrics::INDEX_PAGE_NB_HITS => array(
                'aggregation' => 'sum',
                'query' => "count(*)"
            ),

MultiSite get pageview from Actions: const NB_PAGEVIEWS_METRIC = 'Actions_nb_pageviews'; but MultiSite's actions is different: const NB_ACTIONS_METRIC = 'nb_actions';

In LogAggregator:

protected function getActionsMetricFields()
    {
        return array(
            Metrics::INDEX_NB_VISITS        => "count(distinct " . self::LOG_ACTIONS_TABLE . ".idvisit)",
            Metrics::INDEX_NB_UNIQ_VISITORS => "count(distinct " . self::LOG_ACTIONS_TABLE . ".idvisitor)",
            Metrics::INDEX_NB_ACTIONS       => "count(*)",
        );
    }
anhnongdan commented 7 years ago

After fixing day visit calculation: https://github.com/anhnongdan/BimaxCore#6

DEV Total: 199 visits, 4,400 hits, 10,657 actions, 1.64 G tranffered overall This 4400 hits matches with Actions' number

STANDARD Total: 199 visits, 8,100 pageviews, 8,100 actions, 0 revenue)

anhnongdan commented 7 years ago

Apparently, action calculation have 2 branches: sum of all visits' total_actions or count(*) from action table:

DEBUG VisitsSummary[2017-08-11 10:49:28] SELECT
DEBUG VisitsSummary[2017-08-11 10:49:28] count(distinct log_visit.idvisitor) AS `1`,
DEBUG VisitsSummary[2017-08-11 10:49:28] count(*) AS `2`,
DEBUG VisitsSummary[2017-08-11 10:49:28] sum(log_visit.visit_total_actions) AS `3`,
DEBUG VisitsSummary[2017-08-11 10:49:28] max(log_visit.visit_total_actions) AS `4`,
DEBUG VisitsSummary[2017-08-11 10:49:28] sum(log_visit.visit_total_time) AS `5`,
DEBUG VisitsSummary[2017-08-11 10:49:28] sum(case log_visit.visit_total_actions when 1 then 1 when 0 then 1 else 0 end) AS `6`,
DEBUG VisitsSummary[2017-08-11 10:49:28] sum(case log_visit.visit_goal_converted when 1 then 1 else 0 end) AS `7`,
DEBUG VisitsSummary[2017-08-11 10:49:28] count(distinct log_visit.user_id) AS `39`
DEBUG VisitsSummary[2017-08-11 10:49:28] FROM
DEBUG VisitsSummary[2017-08-11 10:49:28] piwik_log_visit AS log_visit
DEBUG VisitsSummary[2017-08-11 10:49:28] WHERE
DEBUG VisitsSummary[2017-08-11 10:49:28] log_visit.visit_last_action_time >= ?
DEBUG VisitsSummary[2017-08-11 10:49:28] AND log_visit.visit_last_action_time <= ?
DEBUG VisitsSummary[2017-08-11 10:49:28] AND log_visit.idsite IN (?)
DEBUG VisitsSummary[2017-08-11 10:49:28] LogAggr.queryVisitsByDimension: bind: ["2017-08-07 16:00:00","2017-08-07 16:09:59",3]
anhnongdan commented 7 years ago

With new calculating process, we can see:

  1. All subperiods (hour) are calculated
  2. then ArchiveSelector collect the metrics
  3. if the period is 'day', it recalculates visit
  4. ArchiveSelector return the result to ArchiveProcessor
  5. ArchiveProcessor returns all metrics (to PluginArchive as DataTable) and write archive to archive tables
  6. Actions plugin archive kick in, calculate min, max and breaks down the actions:
DEBUG VisitsSummary[2017-08-14 04:27:38] ArchiveSelector::getArchiveIdAndVisits dateStartIso: 2017-08-08 23:50:00 EndIso: 2017-08-08 23:59:59 ts_archiveUTC: 2017-08-08 15:59:59
DEBUG VisitsSummary[2017-08-14 04:27:38] ArchiveProcessor::getAggregatedNumericMetrics, recalculate visit for day
DEBUG VisitsSummary[2017-08-14 04:27:38] LogAggr::getGeneralQueryBindParams: start:2017-08-07 16:00:00 end:2017-08-08 15:59:59
DEBUG VisitsSummary[2017-08-14 04:27:38] LogAggr.queryVisitsByDimension: without ranking query: /* trigger = CronArchive */
DEBUG VisitsSummary[2017-08-14 04:27:38]
DEBUG VisitsSummary[2017-08-14 04:27:38] SELECT
DEBUG VisitsSummary[2017-08-14 04:27:38] count(*) AS `2`,
DEBUG VisitsSummary[2017-08-14 04:27:38] sum(log_visit.visit_total_time) AS `5`,
DEBUG VisitsSummary[2017-08-14 04:27:38] sum(case log_visit.visit_goal_converted when 1 then 1 else 0 end) AS `7`
DEBUG VisitsSummary[2017-08-14 04:27:38] FROM
DEBUG VisitsSummary[2017-08-14 04:27:38] piwik_log_visit AS log_visit
DEBUG VisitsSummary[2017-08-14 04:27:38] WHERE
DEBUG VisitsSummary[2017-08-14 04:27:38] log_visit.visit_last_action_time >= ?
DEBUG VisitsSummary[2017-08-14 04:27:38] AND log_visit.visit_last_action_time <= ?
DEBUG VisitsSummary[2017-08-14 04:27:38] AND log_visit.idsite IN (?)
DEBUG VisitsSummary[2017-08-14 04:27:38] LogAggr.queryVisitsByDimension: bind: ["2017-08-07 16:00:00","2017-08-08 15:59:59",3]
DEBUG VisitsSummary[2017-08-14 04:27:38] ArchiveProcessor::recalculateVisitAndDurationForDay, recalculate result: 199, 171920
DEBUG VisitsSummary[2017-08-14 04:27:38] LogAggr::getGeneralQueryBindParams: start:2017-08-07 16:00:00 end:2017-08-08 15:59:59
DEBUG VisitsSummary[2017-08-14 04:27:38] LogAggr.queryVisitsByDimension: without ranking query: /* trigger = CronArchive */
DEBUG VisitsSummary[2017-08-14 04:27:38]
DEBUG VisitsSummary[2017-08-14 04:27:38] SELECT
DEBUG VisitsSummary[2017-08-14 04:27:38] count(distinct log_visit.idvisitor) AS `1`,
DEBUG VisitsSummary[2017-08-14 04:27:38] count(distinct log_visit.user_id) AS `39`
DEBUG VisitsSummary[2017-08-14 04:27:38] FROM
DEBUG VisitsSummary[2017-08-14 04:27:38] piwik_log_visit AS log_visit
DEBUG VisitsSummary[2017-08-14 04:27:38] WHERE
DEBUG VisitsSummary[2017-08-14 04:27:38] log_visit.visit_last_action_time >= ?
DEBUG VisitsSummary[2017-08-14 04:27:38] AND log_visit.visit_last_action_time <= ?
DEBUG VisitsSummary[2017-08-14 04:27:38] AND log_visit.idsite IN (?)
DEBUG VisitsSummary[2017-08-14 04:27:38] LogAggr.queryVisitsByDimension: bind: ["2017-08-07 16:00:00","2017-08-08 15:59:59",3]
DEBUG VisitsSummary[2017-08-14 04:27:38] ArchiveProcessor::getAggregatedNumericMetrics, The returned metric: nb_uniq_visitors=199,nb_visits=199,nb_actions=10657,nb_users=0,max_actions=121,sum_visit_length=171920,bounce_count=4,nb_visits_converted=0,nb_visit_converted=0
DEBUG VisitsSummary[2017-08-14 04:27:39] PluginsArchiver::callAggregateAllPlugins: Initializing archiving process for all plugins [visits = 199, visits converted = 0]
DEBUG VisitsSummary[2017-08-14 04:27:39] PluginsArchiver::callAggregateAllPlugins: Archiving period reports for plugin 'Actions'.
DEBUG Actions[2017-08-14 04:27:39] [Thangnt 1107] Ar.Proc::aggregateDataTableRecords recordName: Actions_actions, aggre. operation: {"33":"max","32":"min"}
DEBUG Actions[2017-08-14 04:27:39] CoreArchive get data for: [3], Actions_actions, blob
DEBUG Actions[2017-08-14 04:27:39] CoreArchive get archiveids for: Actions
DEBUG Actions[2017-08-14 04:27:39] CoreArchive get doneFlag: done
DEBUG Actions[2017-08-14 04:27:39] [Thangnt 1107] Ar.Proc::aggregateDataTableRecords recordName: Actions_actions_url, aggre. operation: {"33":"max","32":"min"}
DEBUG Actions[2017-08-14 04:27:39] CoreArchive get data for: [3], Actions_actions_url, blob
anhnongdan commented 7 years ago

The problem on -3 ago comment might be caused by log ration. Verifying with today's log.

Verifying with 2 periods of log and everything seems ok, sum of action is correct, now hit and action are unified. Visit 'looks' ok. => Will verify on live server.

anhnongdan commented 7 years ago

Pageview is calculated by Actions Plugin and actions is by VisitSummary:

screenshot from 2017-08-16 13-15-06

anhnongdan commented 7 years ago

With current code, calculating on local cBimax without rotating log_link_visit_action table give accurate pageview and Action count (verified through number of log lines imported).

Total: 550 visits, 12,000 hits, 12,000 actions, 4.87 G tranffered overall

anhnongdan commented 7 years ago

=> Result hits and actions has gap: Total: 580 visits, 13,400 hits, 13,875 actions

Hourly: (16h used to be 1k before rotated)

screenshot from 2017-08-16 16-36-21

Temp archive calculated correctly (17:00 and 17:20 don't have data):

| 2017-08-16 17:10:00 | 2017-08-16 17:19:59 | 2017-08-16 09:34:31 |
| 2017-08-16 17:10:00 | 2017-08-16 17:19:59 | 2017-08-16 09:34:31 |
| 2017-08-16 17:10:00 | 2017-08-16 17:19:59 | 2017-08-16 09:34:31 |
| 2017-08-16 17:10:00 | 2017-08-16 17:19:59 | 2017-08-16 09:34:31 |
| 2017-08-16 17:10:00 | 2017-08-16 17:19:59 | 2017-08-16 09:34:31 |
| 2017-08-16 17:10:00 | 2017-08-16 17:19:59 | 2017-08-16 09:34:31 |
| 2017-08-16 17:10:00 | 2017-08-16 17:19:59 | 2017-08-16 09:34:32 |
| 2017-08-16 17:10:00 | 2017-08-16 17:19:59 | 2017-08-16 09:34:32 |
| 2017-08-16 17:10:00 | 2017-08-16 17:19:59 | 2017-08-16 09:34:32 |
| 2017-08-16 17:10:00 | 2017-08-16 17:19:59 | 2017-08-16 09:34:32 |
| 2017-08-16 17:10:00 | 2017-08-16 17:19:59 | 2017-08-16 09:34:33 |
| 2017-08-16 17:10:00 | 2017-08-16 17:19:59 | 2017-08-16 09:34:33 |
| 2017-08-16 17:10:00 | 2017-08-16 17:19:59 | 2017-08-16 09:34:33 |
anhnongdan commented 7 years ago

Next: reduce log_link_visit_action to 0

Then import 2.8k (hmm, the result seems correct)

screenshot from 2017-08-16 16-52-45

Anyway, action and pageview: Total: 773 visits, 16,200 hits, 16,675 actions, 6.61 G tranffered overall previous: 580 visits, 13,400 hits, 13,875 actions

The gap doens't change. The import 0.7k without rotating > confirm in DB.

And the gap open up again: Total: 834 visits, 16,900 hits, 19,749 actions, 6.87 G tranffered overall

anhnongdan commented 7 years ago

Hypothesis: The actions is calculated by VisitSummary as CoreMetrics. I can fix this by simply adding nb_actions to VisitSummary's recalculate and run archive again and the result MUST reflect immediately.

anhnongdan commented 7 years ago

It doesn't change :-1: Suspect: 'hourly' archive check there's no new visit from the last visit (16:00 - 16:10) and obmit the recalculate.

clean all table and re-import log.

1.7k @17:52 Total: 226 visits, 1,700 hits, 1,700 actions, 675.74 M tranffered overall 0.8k@18:18 0.8k@18:25 Still have problem:Total: 1,132 visits, 3,300 hits, 4,160 actions, 1.32 G tranffered overall

nb_actions still wrong:

| nb_actions | 2017-08-16 |  1700 |
| nb_actions | 2017-08-16 |  4160 
anhnongdan commented 7 years ago

Missed 1 line of code: $row->setColumn('nb_actions', $visits[Metrics::INDEX_NB_ACTIONS]);

Run archive without doing anything. Affect immediately.

Total: 1,132 visits, 3,300 hits, 3,300 actions, 1.32 G tranffered overall

Confirm on Live Server

anhnongdan commented 7 years ago

Now live server gives correct result.

Verify again on #32 Fix on anhnongdan/BimaxCore#6