anhnongdan / sBIMAX

Official source code for sBIMAX - the stream analyzer for Media Content Provider
0 stars 0 forks source link

[MediaAnalytics] Completion percentile plays count for Video Analytics #8

Open anhnongdan opened 6 years ago

anhnongdan commented 6 years ago

Video Analytics currently provides:

https://videoanalytic.vnpt.bimax.tv/pw2/index.php?module=API&method=MediaAnalytics.getVideoResources&date=today&period=day&idSite=2018&token_auth=bce12b0011714f1836a4b4cccdbb190f

<result>
<row>
<label>vod.thvli.vn</label>
<nb_plays>59703</nb_plays>
<nb_unique_visitors_plays>47261</nb_unique_visitors_plays>
<nb_impressions>60082</nb_impressions>
<nb_unique_visitors_impressions>47527</nb_unique_visitors_impressions>
<nb_finishes>1840</nb_finishes>
<sum_time_progress>69643088</sum_time_progress>
<nb_plays_with_tip>59635</nb_plays_with_tip>
<nb_plays_with_ml>59701</nb_plays_with_ml>
<sum_fullscreen_plays>7694</sum_fullscreen_plays>
<play_rate>0.99</play_rate>
<finish_rate>0.03</finish_rate>
<fullscreen_rate>0.13</fullscreen_rate>
<avg_time_watched>1167</avg_time_watched>
<avg_completion_rate>0.12</avg_completion_rate>
<avg_time_to_play>31</avg_time_to_play>
<avg_media_length>9654</avg_media_length>
<openable>1</openable>
<idsubdatatable>31011</idsubdatatable>
</row>
<row>
<label>live2.thvli.vn</label>
<nb_plays>6359</nb_plays>
<nb_unique_visitors_plays>6068</nb_unique_visitors_plays>
<nb_impressions>6361</nb_impressions>
<nb_unique_visitors_impressions>6070</nb_unique_visitors_impressions>
<nb_finishes>0</nb_finishes>
<sum_time_progress>13510</sum_time_progress>
<nb_plays_with_tip>6352</nb_plays_with_tip>
<nb_plays_with_ml>6358</nb_plays_with_ml>
<sum_fullscreen_plays>890</sum_fullscreen_plays>
<play_rate>1</play_rate>
<finish_rate>0</finish_rate>
<fullscreen_rate>0.14</fullscreen_rate>
<avg_time_watched>11</avg_time_watched>
<avg_completion_rate>0</avg_completion_rate>
<avg_time_to_play>38</avg_time_to_play>
<avg_media_length>15098</avg_media_length>
<openable>1</openable>
<idsubdatatable>2326</idsubdatatable>
</row>
<row>
<label/>
<nb_plays>3513</nb_plays>
<nb_unique_visitors_plays>3353</nb_unique_visitors_plays>
<nb_impressions>4657</nb_impressions>
<nb_unique_visitors_impressions>4376</nb_unique_visitors_impressions>
<nb_finishes>468</nb_finishes>
<sum_time_progress>5953452</sum_time_progress>
<nb_plays_with_tip>505</nb_plays_with_tip>
<nb_plays_with_ml>3279</nb_plays_with_ml>
<sum_fullscreen_plays>0</sum_fullscreen_plays>
<play_rate>0.75</play_rate>
<finish_rate>0.13</finish_rate>
<fullscreen_rate>0</fullscreen_rate>
<avg_time_watched>1034</avg_time_watched>
<avg_completion_rate>0.42</avg_completion_rate>
<avg_time_to_play>63</avg_time_to_play>
<avg_media_length>2621</avg_media_length>
<openable>1</openable>
<idsubdatatable>2300</idsubdatatable>
</row>
anhnongdan commented 6 years ago
MariaDB [pw2]> desc piwik_log_media;
+----------------------+---------------------+------+-----+---------+-------+
| Field                | Type                | Null | Key | Default | Extra |
+----------------------+---------------------+------+-----+---------+-------+
| idvisitor            | binary(8)           | NO   |     | NULL    |       |
| idvisit              | bigint(20) unsigned | NO   | PRI | NULL    |       |
| idsite               | int(11) unsigned    | NO   | MUL | NULL    |       |
| idview               | varchar(16)         | NO   | PRI | NULL    |       |
| player_name          | varchar(20)         | NO   |     | NULL    |       |
| media_type           | tinyint(1)          | NO   |     | NULL    |       |
| resolution           | varchar(20)         | YES  |     |         |       |
| fullscreen           | tinyint(1) unsigned | NO   |     | NULL    |       |
| media_title          | varchar(150)        | YES  |     |         |       |
| resource             | varchar(300)        | NO   |     | NULL    |       |
| server_time          | datetime            | NO   |     | NULL    |       |
| time_to_initial_play | int(11) unsigned    | YES  |     | NULL    |       |
| watched_time         | bigint(20) unsigned | YES  |     | 0       |       |
| media_progress       | int(11) unsigned    | YES  |     | 0       |       |
| media_length         | int(11) unsigned    | YES  |     | 0       |       |
+----------------------+---------------------+------+-----+---------+-------+
15 rows in set (0.00 sec)
anhnongdan commented 6 years ago
MariaDB [pw2]> select distinct(name) from piwik_archive_blob_2017_12 where name like "%Media%" limit 40;
+------------------------------------------------------------+
| name                                                       |
+------------------------------------------------------------+
| MediaAnalytics_audio_groupedresources_record               |
| MediaAnalytics_audio_resources_record                      |
| MediaAnalytics_audio_title_record                          |
| MediaAnalytics_video_groupedresources_record               |
| MediaAnalytics_video_groupedresources_record_chunk_0_99    |
| MediaAnalytics_video_resources_record                      |
| MediaAnalytics_video_resources_record_chunk_0_99           |
| MediaAnalytics_video_title_record                          |
| MediaAnalytics_video_title_record_chunk_0_99               |
| MediaAnalytics_audio_hours_record                          |
| MediaAnalytics_playernames_record                          |
| MediaAnalytics_video_hours_record                          |
| MediaAnalytics_video_resolutions_record                    |
| MediaAnalytics_video_title_record_chunk_100_199            |
| MediaAnalytics_video_title_record_chunk_200_299            |
| MediaAnalytics_video_groupedresources_record_chunk_100_199 |
| MediaAnalytics_video_resources_record_chunk_100_199        |
| MediaAnalytics_video_resources_record_chunk_10000_10099    |
anhnongdan commented 6 years ago

Proposal:

The numbers need to be calculated from raw log. I can use MediaAnalytics' Archiver to make appropriate queries.

#145: private function makeRegularReport($dataArrays, $where, $groupByColumn, $withSubtableReport = false)

anhnongdan commented 6 years ago

Some popular queries that MediaAnalytics uses to calculate the archives:

DEBUG MediaAnalytics[2017-12-04 08:50:27] SELECT
DEBUG MediaAnalytics[2017-12-04 08:50:27] log_media.media_title as label,
DEBUG MediaAnalytics[2017-12-04 08:50:27] count(log_media.idvisit) as nb_impressions,
DEBUG MediaAnalytics[2017-12-04 08:50:27] count(distinct log_media.idvisitor) as nb_unique_visitors_impressions
DEBUG MediaAnalytics[2017-12-04 08:50:27] FROM
DEBUG MediaAnalytics[2017-12-04 08:50:27] piwik_log_media AS log_media
DEBUG MediaAnalytics[2017-12-04 08:50:27] WHERE
DEBUG MediaAnalytics[2017-12-04 08:50:27] log_media.server_time >= ?
DEBUG MediaAnalytics[2017-12-04 08:50:27] AND log_media.server_time <= ?
DEBUG MediaAnalytics[2017-12-04 08:50:27] AND log_media.idsite IN (?)  AND log_media.media_type = 1
DEBUG MediaAnalytics[2017-12-04 08:50:27] GROUP BY
DEBUG MediaAnalytics[2017-12-04 08:50:27] label
DEBUG MediaAnalytics[2017-12-04 08:50:27] ORDER BY
DEBUG MediaAnalytics[2017-12-04 08:50:27] nb_impressions
DEBUG MediaAnalytics[2017-12-04 08:50:27] Media Analytics archiver: /* trigger = CronArchive */
DEBUG MediaAnalytics[2017-12-04 08:50:27]

Look at the below query, we can see that nb_finishs are counted as all entries where media progress reach 'the end' of media length. So how about when a visitor reach the end, then what again till the end? Plus, idvisit is counter for each title as nb_plays

DEBUG MediaAnalytics[2017-12-04 08:50:27] SELECT
DEBUG MediaAnalytics[2017-12-04 08:50:27] log_media.media_title as label,
DEBUG MediaAnalytics[2017-12-04 08:50:27] count(log_media.idvisit) as nb_plays,
DEBUG MediaAnalytics[2017-12-04 08:50:27] count(distinct log_media.idvisitor) as nb_unique_visitors_plays,
DEBUG MediaAnalytics[2017-12-04 08:50:27] sum(if(log_media.media_length > 2 AND log_media.media_progress >= (log_media.media_length - 2), 1, 0)) as nb_finishes,
DEBUG MediaAnalytics[2017-12-04 08:50:27] sum(log_media.time_to_initial_play) as sum_time_to_play,
DEBUG MediaAnalytics[2017-12-04 08:50:27] sum(if(log_media.time_to_initial_play is null, 0, 1)) as nb_plays_with_tip,
DEBUG MediaAnalytics[2017-12-04 08:50:27] sum(log_media.watched_time) as sum_time_watched,
DEBUG MediaAnalytics[2017-12-04 08:50:27] sum(log_media.media_progress) as sum_time_progress,
DEBUG MediaAnalytics[2017-12-04 08:50:27] sum(log_media.media_length) as sum_media_length,
DEBUG MediaAnalytics[2017-12-04 08:50:27] sum(if(log_media.media_length > 0, 1, 0)) as nb_plays_with_ml,
DEBUG MediaAnalytics[2017-12-04 08:50:27] sum(log_media.fullscreen) as sum_fullscreen_plays
DEBUG MediaAnalytics[2017-12-04 08:50:27] FROM
DEBUG MediaAnalytics[2017-12-04 08:50:27] piwik_log_media AS log_media
DEBUG MediaAnalytics[2017-12-04 08:50:27] WHERE
DEBUG MediaAnalytics[2017-12-04 08:50:27] log_media.server_time >= ?
DEBUG MediaAnalytics[2017-12-04 08:50:27] AND log_media.server_time <= ?
DEBUG MediaAnalytics[2017-12-04 08:50:27] AND log_media.idsite IN (?)  AND watched_time > 1  AND log_media.media_type = 1
DEBUG MediaAnalytics[2017-12-04 08:50:27] GROUP BY
DEBUG MediaAnalytics[2017-12-04 08:50:27] label
DEBUG MediaAnalytics[2017-12-04 08:50:27] ORDER BY
DEBUG MediaAnalytics[2017-12-04 08:50:27] nb_plays
DEBUG MediaAnalytics[2017-12-04 08:50:27] SELECT
DEBUG MediaAnalytics[2017-12-04 08:50:27] log_media.media_title as parentLabel,
DEBUG MediaAnalytics[2017-12-04 08:50:27] log_media.watched_time as label,
DEBUG MediaAnalytics[2017-12-04 08:50:27] count(log_media.watched_time) as nb_plays
DEBUG MediaAnalytics[2017-12-04 08:50:27] FROM
DEBUG MediaAnalytics[2017-12-04 08:50:27] piwik_log_media AS log_media
DEBUG MediaAnalytics[2017-12-04 08:50:27] WHERE
DEBUG MediaAnalytics[2017-12-04 08:50:27] log_media.server_time >= ?
DEBUG MediaAnalytics[2017-12-04 08:50:27] AND log_media.server_time <= ?
DEBUG MediaAnalytics[2017-12-04 08:50:27] AND log_media.idsite IN (?)  AND watched_time > 1  AND log_media.media_type = 1
DEBUG MediaAnalytics[2017-12-04 08:50:27] GROUP BY
DEBUG MediaAnalytics[2017-12-04 08:50:27] log_media.media_title, log_media.watched_time
DEBUG MediaAnalytics[2017-12-04 08:50:27] Media Analytics archiver: /* trigger = CronArchive */
DEBUG MediaAnalytics[2017-12-04 08:50:27]
DEBUG MediaAnalytics[2017-12-04 08:50:27] SELECT
DEBUG MediaAnalytics[2017-12-04 08:50:27] log_media.media_title as parentLabel,
DEBUG MediaAnalytics[2017-12-04 08:50:27] round((media_progress / media_length) * 100) as label,
DEBUG MediaAnalytics[2017-12-04 08:50:27] count(log_media.media_length) as nb_plays
DEBUG MediaAnalytics[2017-12-04 08:50:27] FROM
DEBUG MediaAnalytics[2017-12-04 08:50:27] piwik_log_media AS log_media
DEBUG MediaAnalytics[2017-12-04 08:50:27] WHERE
DEBUG MediaAnalytics[2017-12-04 08:50:27] log_media.server_time >= ?
DEBUG MediaAnalytics[2017-12-04 08:50:27] AND log_media.server_time <= ?
DEBUG MediaAnalytics[2017-12-04 08:50:27] AND log_media.idsite IN (?)  AND watched_time > 1  AND log_media.media_type = 1 AND log_media.media_length > 0
DEBUG MediaAnalytics[2017-12-04 08:50:27] GROUP BY
DEBUG MediaAnalytics[2017-12-04 08:50:27] log_media.media_title, label
DEBUG MediaAnalytics[2017-12-04 08:50:27] Media Analytics archiver: /* trigger = CronArchive */
DEBUG MediaAnalytics[2017-12-04 08:50:27]
DEBUG MediaAnalytics[2017-12-04 08:50:27] SELECT
DEBUG MediaAnalytics[2017-12-04 08:50:27] log_media.media_title as parentLabel,
DEBUG MediaAnalytics[2017-12-04 08:50:27] log_media.resolution as label,
DEBUG MediaAnalytics[2017-12-04 08:50:27] count(log_media.idvisit) as nb_plays,
DEBUG MediaAnalytics[2017-12-04 08:50:27] sum(if(log_media.media_length > 2 AND log_media.media_progress >= (log_media.media_length - 2), 1, 0)) as nb_finishes,
DEBUG MediaAnalytics[2017-12-04 08:50:27] sum(log_media.watched_time) as sum_time_watched
DEBUG MediaAnalytics[2017-12-04 08:50:27] FROM
DEBUG MediaAnalytics[2017-12-04 08:50:27] piwik_log_media AS log_media
DEBUG MediaAnalytics[2017-12-04 08:50:27] WHERE
DEBUG MediaAnalytics[2017-12-04 08:50:27] log_media.server_time >= ?
DEBUG MediaAnalytics[2017-12-04 08:50:27] AND log_media.server_time <= ?
DEBUG MediaAnalytics[2017-12-04 08:50:27] AND log_media.idsite IN (?)  AND watched_time > 1  AND log_media.media_type = 1
DEBUG MediaAnalytics[2017-12-04 08:50:27] GROUP BY
DEBUG MediaAnalytics[2017-12-04 08:50:27] log_media.media_title, log_media.resolution
DEBUG MediaAnalytics[2017-12-04 08:50:27] ORDER BY
DEBUG MediaAnalytics[2017-12-04 08:50:27] nb_plays
DEBUG MediaAnalytics[2017-12-04 08:50:27] Media Analytics archiver: /* trigger = CronArchive */
anhnongdan commented 6 years ago

Notice that MediaAnalytics use sum(log_media.watched_time) as sum_time_watched to calc. sum watched time. It seems that watched time is measured individually in each tracking message (not accumulated).

DEBUG MediaAnalytics[2017-12-04 08:50:27] SELECT
DEBUG MediaAnalytics[2017-12-04 08:50:27] log_media.media_title as parentLabel,
DEBUG MediaAnalytics[2017-12-04 08:50:27] log_media.resolution as label,
DEBUG MediaAnalytics[2017-12-04 08:50:27] count(log_media.idvisit) as nb_plays,
DEBUG MediaAnalytics[2017-12-04 08:50:27] sum(if(log_media.media_length > 2 AND log_media.media_progress >= (log_media.media_length - 2), 1, 0)) as nb_finishes,
DEBUG MediaAnalytics[2017-12-04 08:50:27] sum(log_media.watched_time) as sum_time_watched
DEBUG MediaAnalytics[2017-12-04 08:50:27] FROM
DEBUG MediaAnalytics[2017-12-04 08:50:27] piwik_log_media AS log_media
DEBUG MediaAnalytics[2017-12-04 08:50:27] WHERE
DEBUG MediaAnalytics[2017-12-04 08:50:27] log_media.server_time >= ?
DEBUG MediaAnalytics[2017-12-04 08:50:27] AND log_media.server_time <= ?
DEBUG MediaAnalytics[2017-12-04 08:50:27] AND log_media.idsite IN (?)  AND watched_time > 1  AND log_media.media_type = 1
DEBUG MediaAnalytics[2017-12-04 08:50:27] GROUP BY
DEBUG MediaAnalytics[2017-12-04 08:50:27] log_media.media_title, log_media.resolution
DEBUG MediaAnalytics[2017-12-04 08:50:27] ORDER BY
DEBUG MediaAnalytics[2017-12-04 08:50:27] nb_plays

Plus, avg_completion_rate = avg_time_watched / avg_media_length

<avg_time_watched>1166</avg_time_watched>
<avg_completion_rate>0.12</avg_completion_rate>
<avg_time_to_play>30</avg_time_to_play>
<avg_media_length>9718</avg_media_length>

How watched_time is tracked?? #7

anhnongdan commented 6 years ago

Try counting watched_time:

MariaDB [pw2]> select log_media.media_title as label, sum(log_media.watched_time) as sum_time_watched from piwik_log_media AS log_media where log_media.server_time > "2017-12-04 09:00:00" and log_media.server_time < "2017-12-04 09:10:00" GROUP BY label limit 2;
+-------------------------------+------------------+
| label                         | sum_time_watched |
+-------------------------------+------------------+
|   �m M?u V� T�nh Y�u - T?p 406 |             2308 |
|   �m M?u V� T�nh Y�u - T?p 407 |              423 |
+-------------------------------+------------------+
2 rows in set (3 min 45.18 sec)

This kind of query take quite some time.

anhnongdan commented 6 years ago

From below query, we can see that plays for each title need to be counted by (idvisit, media_title). But watched_time looks a bit weird. While this visitor watched 17s on 2017-08-30 03:17:48 and then tracked 158s on 2017-08-30 03:18:14.

Be aware that every below entries have different idviews.

MariaDB [pw2]> select idsite, idview, idvisit, media_title, resource, watched_time, media_progress, server_time, media_length from piwik_log_media where idvisit=15 ORDER BY server_time;
+--------+------------------+---------+--------------------------------------+--------------------------------------------------------------+--------------+----------------+---------------------+--------------+
| idsite | idview           | idvisit | media_title                          | resource                                                     | watched_time | media_progress | server_time         | media_length |
+--------+------------------+---------+--------------------------------------+--------------------------------------------------------------+--------------+----------------+---------------------+--------------+
|   2000 | 72XtpZB2R6usKawz |      15 | T�nh Bolero � Phi�n B?n Ngh? S? 2017 | blob:///http://thvli.vn/4ecb20e1-dc69-4da4-bbc5-f4ef656bca94 |            0 |              0 | 2017-08-30 03:13:47 |         4893 |
|   2000 | mBg62ebgKF8GXvEt |      15 | T�nh Bolero � Phi�n B?n Ngh? S? 2017 | blob:///http://thvli.vn/d6eaafd2-eeea-431d-8026-c58f00b7cd1e |            0 |              0 | 2017-08-30 03:13:55 |            0 |
|   2000 | AcFbZq2SYCDXsZSG |      15 | TRANG CH?                            | blob:///http://thvli.vn/ad73c700-039a-46dc-a91b-8b5f61318814 |          192 |            811 | 2017-08-30 03:13:57 |         4289 |
|   2000 | jv44USmztrnNHjG3 |      15 | Ai S? Th�nh Sao Nh�                  | blob:///http://thvli.vn/c910e98d-c48d-4910-9fc5-4dd8906d0dfc |            0 |              1 | 2017-08-30 03:17:23 |            0 |
|   2000 | nMyY01NDno46Qnuq |      15 | Tr?n Trung K? �n                     | blob:///http://thvli.vn/feb1e34f-ef9e-46e3-86ba-e27976c8abee |           22 |           1623 | 2017-08-30 03:17:24 |         2758 |
|   2000 | fVa4WaLPDn6jmWex |      15 | Tr?n Trung K? �n                     | blob:///http://thvli.vn/4483b715-8310-4e06-af77-8ff6f799bdd9 |           17 |            486 | 2017-08-30 03:17:48 |         2765 |
|   2000 | yyQQ56LFkzALx0sG |      15 | Tr?n Trung K? �n                     | blob:///http://thvli.vn/1806e0d8-a967-4925-bee6-ecc2bb0944b4 |          158 |           2661 | 2017-08-30 03:18:14 |         2741 |
|   2000 | TJ88WPNB2cB3Ukm8 |      15 | Tr?n Trung K? �n                     | blob:///http://thvli.vn/a0a53630-fe49-4b38-9973-0ca2f8142d71 |           61 |            183 | 2017-08-30 03:21:20 |         2765 |
+--------+------------------+---------+--------------------------------------+--------------------------------------------------------------+--------------+----------------+---------------------+--------------+
8 rows in set (0.01 sec)

=> 'Tran Trung Ki An' title on idsite 2000 has length 45min46 (2746) on web UI. screen shot 2017-12-04 at 9 40 47 pm

anhnongdan commented 6 years ago

Remember that selecting the media title is very slow. It seems that media titles are stored as variable length UTF8 strings with no index.

anhnongdan commented 6 years ago

It seems that nb_play can be counted quite accurately. On Web UI, each title has Live and VoD files, so multiple titles for one media content might be available.

MariaDB [pw2]> select media_title as label, count(idvisit), idvisit, idsite, media_length as nb_plays from piwik_log_media as log_media where log_media.server_time > "2017-12-04 09:00:00" GROUP BY label limit 10;
+--------------------------------+----------------+----------+--------+----------+
| label                          | count(idvisit) | idvisit  | idsite | nb_plays |
+--------------------------------+----------------+----------+--------+----------+
|   �m M?u V� T�nh Y�u - T?p  412 |              1 | 26512304 |   2008 |    15111 |
|   �m M?u V� T�nh Y�u - T?p  414 |              3 | 26478571 |   2008 |    15111 |
|   �m M?u V� T�nh Y�u - T?p  416 |              1 | 26577275 |   2008 |    15111 |
|   �m M?u V� T�nh Y�u - T?p  417 |              1 | 26650324 |   2008 |    15111 |
|   �m M?u V� T�nh Y�u - T?p  419 |              1 | 26613024 |   2008 |    15111 |
|   �m M?u V� T�nh Y�u - T?p  420 |              2 | 26564261 |   2008 |    15111 |
|   �m M?u V� T�nh Y�u - T?p  425 |              1 | 26580218 |   2008 |    15111 |
|   �m M?u V� T�nh Y�u - T?p  428 |              1 | 26583316 |   2008 |    15111 |
|   �m M?u V� T�nh Y�u - T?p  435 |              1 | 26662478 |   2008 |    15111 |
|   �m M?u V� T�nh Y�u - T?p  448 |              1 | 26555277 |   2008 |    15111 |
+--------------------------------+----------------+----------+--------+----------+
10 rows in set (3 min 38.06 sec)
anhnongdan commented 6 years ago
MariaDB [pw2]> select media_title as label, idvisit, idsite, media_length, watched_time, server_time from piwik_log_media as log_media where log_media.server_time > "2017-12-04" AND idvisit=26478571;
+----------------------------------------------------------------------------------------------------------------------+----------+--------+--------------+--------------+---------------------+
| label                                                                                                                | idvisit  | idsite | media_length | watched_time | server_time         |
+----------------------------------------------------------------------------------------------------------------------+----------+--------+--------------+--------------+---------------------+
| �m M?u V� T�nh Y�u - T?p 921                                                                                         | 26478571 |   2008 |         2565 |         1983 | 2017-12-04 10:47:21 |
| �m M?u V� T�nh Y�u                                                                                                   | 26478571 |   2008 |        15111 |         2424 | 2017-12-04 03:27:01 |
| Solo c�ng Bolero - M�a 4                                                                                             | 26478571 |   2008 |        15111 |           10 | 2017-12-04 13:49:03 |
| �m M?u V� T�nh Y�u - T?p 397                                                                                         | 26478571 |   2008 |         2487 |            1 | 2017-12-04 13:52:53 |
| �m M?u V� T�nh Y�u -T?p 515                                                                                          | 26478571 |   2008 |        15111 |           10 | 2017-12-04 03:39:41 |
| �m M?u V� T�nh Y�u -T?p 547                                                                                          | 26478571 |   2008 |        15111 |          830 | 2017-12-04 04:18:11 |
| �m M?u V� T�nh Y�u - T?p 572                                                                                         | 26478571 |   2008 |        15111 |           10 | 2017-12-04 03:51:11 |
| �m M?u V� T�nh Y�u -T?p 546                                                                                          | 26478571 |   2008 |        15111 |         2650 | 2017-12-04 04:13:01 |
| �m M?u V� T�nh Y�u -T?p 474                                                                                          | 26478571 |   2008 |         2656 |            4 | 2017-12-04 03:51:41 |
| �m M?u V� T�nh Y�u - T?p 286                                                                                         | 26478571 |   2008 |        15111 |          443 | 2017-12-04 11:49:51 |
| �m M?u V� T�nh Y�u  - T?p  398                                                                                       | 26478571 |   2008 |        15111 |           10 | 2017-12-04 13:52:53 |
| �m M?u V� T�nh Y�u - T?p 924                                                                                         | 26478571 |   2008 |        15111 |         2540 | 2017-12-04 03:27:01 |
| �m M?u V� T�nh Y�u - T?p 291                                                                                         | 26478571 |   2008 |        15111 |          735 | 2017-12-04 12:43:21 |
| �m M?u V� T�nh Y�u - T?p 593                                                                                         | 26478571 |   2008 |         2625 |         2625 | 2017-12-04 08:29:21 |
|   �m M?u V� T�nh Y�u - T?p  414                                                                                       | 26478571 |   2008 |        15111 |           10 | 2017-12-04 13:54:02 |
| �m M?u V� T�nh Y�u -T?p 545                                                                                          | 26478571 |   2008 |        15111 |         2269 | 2017-12-04 03:41:51 |
| ?�i M?t �n T�nh                                                                                                      | 26478571 |   2008 |        15111 |          190 | 2017-12-04 10:16:52 |
| �m M?u V� T�nh Y�u - T?p 922                                                                                         | 26478571 |   2008 |        15111 |           10 | 2017-12-04 03:51:01 |
| �m M?u V� T�nh Y�u - T?p 285                                                                                         | 26478571 |   2008 |         2448 |         2287 | 2017-12-04 11:13:11 |
| �m M?u V� T�nh Y�u - T?p 594                                                                                         | 26478571 |   2008 |         2639 |         2524 | 2017-12-04 09:02:10 |
| Song Sinh B� ?n                                                                                                      | 26478571 |   2008 |        15111 |         1836 | 2017-12-04 10:20:01 |
| �m M?u v� T�nh y�u T?p 707                                                                                           | 26478571 |   2008 |         2576 |         1758 | 2017-12-04 09:30:10 |
| �m M?u V� T�nh Y�u - T?p 592                                                                                         | 26478571 |   2008 |        15111 |           10 | 2017-12-04 08:29:21 |
| �m M?u V� T�nh Y�u - T?p 709                                                                                         | 26478571 |   2008 |        15111 |           20 | 2017-12-04 09:36:01 |
| �m M?u V� T�nh Y�u - T?p 568                                                                                         | 26478571 |   2008 |        15111 |         1683 | 2017-12-04 06:16:42 |
| �m M?u V� T�nh Y�u - T?p 294                                                                                         | 26478571 |   2008 |        15111 |           16 | 2017-12-04 13:39:32 |
| �m M?u V� T�nh Y�u - T?p 293                                                                                         | 26478571 |   2008 |        15111 |         1145 | 2017-12-04 13:07:02 |
| B� M?t C?a Tr�i Tim                                                                                                  | 26478571 |   2008 |         2636 |           13 | 2017-12-04 10:45:31 |
| H?u tr??ng: H� L? Ti?t M?c L�m Quang L� B? Vi?t H??ng V� Thanh H?ng T? C�n N?ng Tr�n Gh? N�ng Th? T�i Si�u Nh� M�a 2 | 26478571 |   2008 |        15111 |           10 | 2017-12-04 06:17:20 |
| Song Sinh B� ?n - T?p 9                                                                                              | 26478571 |   2008 |        15111 |          747 | 2017-12-04 10:38:22 |
| �m M?u V� T�nh Y�u - T?p 920                                                                                         | 26478571 |   2008 |        15111 |         2337 | 2017-12-04 10:56:21 |
| �m M?u V� T�nh Y�u - T?p 567                                                                                         | 26478571 |   2008 |        15111 |         1202 | 2017-12-04 06:20:51 |
| �m m?u v� T�nh Y�u - T?p 732                                                                                         | 26478571 |   2008 |        15111 |         1842 | 2017-12-04 09:36:42 |
| �m M?u V� T�nh Y�u - T?p 792                                                                                         | 26478571 |   2008 |         2557 |          526 | 2017-12-04 03:16:01 |
+----------------------------------------------------------------------------------------------------------------------+----------+--------+--------------+--------------+---------------------+
34 rows in set (0.00 sec)
anhnongdan commented 6 years ago

About the tracking requests, in about 70s, there's 14 requests sent. -> a request in every 5s

screen shot 2017-12-05 at 10 54 57 am

Take a look at 3 consecutive request parameter -> The ma_st (watched time) and ma_ps (progress) is increased regularly. When adjusting the progress bar of video player, ma_ps is assigned accordingly

Watched time and this media tracking is updated on DB every time tracking request comes to server

screen shot 2017-12-05 at 11 05 48 am
MariaDB [pw2]> select * from piwik_log_media as log_media where idview="pQd4YDuhEzQkDBKs";
+-----------+----------+--------+------------------+-------------+------------+------------+------------+----------------+--------------------------------------------------------------+---------------------+----------------------+--------------+----------------+--------------+
| idvisitor | idvisit  | idsite | idview           | player_name | media_type | resolution | fullscreen | media_title    | resource                                                     | server_time         | time_to_initial_play | watched_time | media_progress | media_length |
+-----------+----------+--------+------------------+-------------+------------+------------+------------+----------------+--------------------------------------------------------------+---------------------+----------------------+--------------+----------------+--------------+
Ag�  | 26757653 |   2000 | pQd4YDuhEzQkDBKs | videojs     |          1 | 1024x576   |          0 | Th?ch C?m ?ang | blob:///http://thvli.vn/881aee7e-d6d0-46ab-a155-3675d74c03bb | 2017-12-05 03:51:03 |                   15 |          405 |           1199 |         2517 |
+-----------+----------+--------+------------------+-------------+------------+------------+------------+----------------+--------------------------------------------------------------+---------------------+----------------------+--------------+----------------+--------------+
1 row in set (3 min 58.08 sec)
anhnongdan commented 6 years ago

Confirm query on individual site results in much better query time. Calc. result match with Web and seems logical.

MariaDB [pw2]> select sum(if(log_media.watched_time >= (log_media.media_length * 0.75), 1, 0)) as 100_perc, sum(if(log_media.watched_time >= (log_media.media_length * 0.5) AND log_media.watched_time < (log_media.media_length * 0.75), 1, 0)) as 75_perc, sum(if(log_media.watched_time >= (log_media.media_length*0.25) AND log_media.watched_time < (log_media.media_length * 0.5), 1, 0)) as 50_perc, sum(if(log_media.watched_time < (log_media.media_length*0.25), 1, 0)) as 25_perc, count(*) as total from piwik_log_media as log_media where log_media.server_time > '2017-12-03 17:00:00' and log_media.server_time < '2017-12-04 16:59:00' and idsite=2017 limit 10;
+----------+---------+---------+---------+-------+
| 100_perc | 75_perc | 50_perc | 25_perc | total |
+----------+---------+---------+---------+-------+
|     6527 |    1910 |    1672 |   14526 | 24635 |
+----------+---------+---------+---------+-------+
1 row in set (0.20 sec)

6527 | 1910 | 1672 | 14526 | 24635 26.49% | 7.75% | 6.79% | 58.96% | 24635

anhnongdan commented 6 years ago

It seems that, on Live channels, percentile counted for watched_time and media_progress is completely the same. While for VoD channels, there're differences.

MariaDB [pw2]> select sum(if(log_media.watched_time >= (log_media.media_length * 0.75), 1, 0)) as 100_perc, sum(if(log_media.watched_time >= (log_media.media_length * 0.5) AND log_media.watched_time < (log_media.media_length * 0.75), 1, 0)) as 75_perc, sum(if(log_media.watched_time >= (log_media.media_length*0.25) AND log_media.watched_time < (log_media.media_length * 0.5), 1, 0)) as 50_perc, sum(if(log_media.watched_time < (log_media.media_length*0.25), 1, 0)) as 25_perc, count(*) as total from piwik_log_media as log_media where log_media.server_time > '2017-12-03 17:00:00' and log_media.server_time < '2017-12-04 16:59:00' and idsite=2000 limit 10;
+----------+---------+---------+---------+-------+
| 100_perc | 75_perc | 50_perc | 25_perc | total |
+----------+---------+---------+---------+-------+
|    11246 |    1297 |    1787 |    6807 | 21137 |
+----------+---------+---------+---------+-------+
1 row in set (0.77 sec)

MariaDB [pw2]> select sum(if(log_media.media_progress >= (log_media.media_length * 0.75), 1, 0)) as 100_perc, sum(if(log_media.media_progress >= (log_media.media_length * 0.5) AND log_media.media_progress < (log_media.media_length * 0.75), 1, 0)) as 75_perc, sum(if(log_media.media_progress >= (log_media.media_length*0.25) AND log_media.media_progress < (log_media.media_length * 0.5), 1, 0)) as 50_perc, sum(if(log_media.media_progress < (log_media.media_length*0.25), 1, 0)) as 25_perc, count(*) as total from piwik_log_media as log_media where log_media.server_time > '2017-12-03 17:00:00' and log_media.server_time < '2017-12-04 16:59:00' and idsite=2000 limit 10;
+----------+---------+---------+---------+-------+
| 100_perc | 75_perc | 50_perc | 25_perc | total |
+----------+---------+---------+---------+-------+
|    16302 |     927 |     841 |    3067 | 21137 |
+----------+---------+---------+---------+-------+
1 row in set (0.73 sec)
anhnongdan commented 6 years ago

Here's the right query to calc. plays, watched_time needs to be > 1, while all included watched_time=0 will be considered 'impression'. Need media_type=1 for video only.

MariaDB [pw2]> select sum(if(log_media.media_progress >= (log_media.media_length * 0.75), 1, 0)) as 100_perc, sum(if(log_media.media_progress >= (log_media.media_length * 0.5) AND log_media.media_progress < (log_media.media_length * 0.75), 1, 0)) as 75_perc, sum(if(log_media.media_progress >= (log_media.media_length*0.25) AND log_media.media_progress < (log_media.media_length * 0.5), 1, 0)) as 50_perc, sum(if(log_media.media_progress < (log_media.media_length*0.25), 1, 0)) as 25_perc, count(*) as total from piwik_log_media as log_media where log_media.server_time > '2017-12-03 17:00:00' and log_media.server_time < '2017-12-04 17:00:00' and idsite=2000 AND watched_time > 1 AND media_type=1;
+----------+---------+---------+---------+-------+
| 100_perc | 75_perc | 50_perc | 25_perc | total |
+----------+---------+---------+---------+-------+
|    10211 |     913 |     826 |    2842 | 14792 |
+----------+---------+---------+---------+-------+
1 row in set (0.74 sec)

MariaDB [pw2]> select sum(if(log_media.media_progress >= (log_media.media_length * 0.75), 1, 0)) as 100_perc, sum(if(log_media.media_progress >= (log_media.media_length * 0.5) AND log_media.media_progress < (log_media.media_length * 0.75), 1, 0)) as 75_perc, sum(if(log_media.media_progress >= (log_media.media_length*0.25) AND log_media.media_progress < (log_media.media_length * 0.5), 1, 0)) as 50_perc, sum(if(log_media.media_progress < (log_media.media_length*0.25), 1, 0)) as 25_perc, count(*) as total from piwik_log_media as log_media where log_media.server_time > '2017-12-03 17:00:00' and log_media.server_time < '2017-12-04 17:00:00' and idsite=2000 ;
+----------+---------+---------+---------+-------+
| 100_perc | 75_perc | 50_perc | 25_perc | total |
+----------+---------+---------+---------+-------+
|    16303 |     928 |     841 |    3068 | 21140 |
+----------+---------+---------+---------+-------+
1 row in set (0.72 sec)
anhnongdan commented 6 years ago

Side issue #9

anhnongdan commented 6 years ago

Added progress based percentile calculation

For idsite=2000:

<result>
<MediaAnalytics_sum_video_plays_pro_25>699</MediaAnalytics_sum_video_plays_pro_25>
<MediaAnalytics_sum_video_plays_pro_50>1314</MediaAnalytics_sum_video_plays_pro_50>
<MediaAnalytics_sum_video_plays_pro_75>1359</MediaAnalytics_sum_video_plays_pro_75>
<MediaAnalytics_sum_video_plays_pro_100>2217</MediaAnalytics_sum_video_plays_pro_100>
<MediaAnalytics_sum_total_video_plays>3382</MediaAnalytics_sum_total_video_plays>
</result>

<result>
<MediaAnalytics_sum_video_plays_25>1569</MediaAnalytics_sum_video_plays_25>
<MediaAnalytics_sum_video_plays_50>443</MediaAnalytics_sum_video_plays_50>
<MediaAnalytics_sum_video_plays_75>293</MediaAnalytics_sum_video_plays_75>
<MediaAnalytics_sum_video_plays_100>1077</MediaAnalytics_sum_video_plays_100>
<MediaAnalytics_sum_total_video_plays>3382</MediaAnalytics_sum_total_video_plays>
</result>