CONP-PCNO / conp-portal

:bar_chart: The CONP data portal
https://portal.conp.ca/
MIT License
8 stars 24 forks source link

Display popular keyword searches #406

Closed cmadjar closed 3 years ago

cmadjar commented 3 years ago

Purpose

Display on the portal popular keyword searches.

Before implementing that, do some pre-analysis to see if we can get helpful information from that endpoint and how feasible it is to implement.

Context

Matomo offers a nice API capability that allows to grep some summary statistics that we could process and display in addition to the widgets that Matomo already offers.

In order to get an idea of keyword searches, we need to perform some analysis on the data collected by Matomo so the easiest would be to import summary statistics into a table in the database of the portal via the API.

Possible Implementation

Several steps will be required in order to obtain that:

  1. Create a table in the flask database that will contain results from the Matomo API query (matomo_getSiteSearchKeywords - schema to be determined based on the API response and the information we might need)
  2. Create/add to the function update_analytics to run with flask that will update the table above
  3. Query the Matomo API for a given timeframe (day, week, month available. Start maybe with day) to produce summary statistics on the method Actions.getSiteSearchKeywords and insert the results into the matomo_getSiteSearchKeywords table
  4. Create a function that will query matomo_getSiteSearchKeywords and analyse which keyword was used before exiting the search page (to get read of most of the b, bi, big... for bigbrain searches etc...)
  5. Display a table or graph showing popular keywords

Related issues

14

cmadjar commented 3 years ago

410 queries Matomo's API and inserts results in matomo_daily_site_keyword_searches_summary. Columns are:

 id                  = ID
    date                = date
    avg_time_on_page    = average time spent, in seconds, on this page
    bounce_rate         = ratio of visits leaving the website after landing on this page
    exit_nb_visits      = number of visits that finished on this page
    exit_rate           = ratio of visits that do not view any other page after this page
    label               = keyword searched
    nb_hits             = number of views on this page
    nb_pages_per_search = number of pages displayed for the searched keyword
    nb_visits           = number of visits (30 min of inactivity considered a new visit)
    segment             = segment with keyword search
    sum_time_spent      = total time spent on this page, in seconds

Query to select the total number of hits per label.

SELECT label, SUM(nb_hits) as search FROM matomo_daily_site_keyword_searches_summary GROUP BY label ORDER BY search;

My guess is that it is what we wish to display. Results of that query looks like:

{ ... more available below 10 hits }
ubc|11
braincode|12
preventad|12
MRI|13
neuromod|13
mri|15
bids|16
mouse|16
mousebytes|16
m|18
eprs|20
fsl|20
user/register|20
prevent|26
bigbrain|40

We could ignore keywords of one character and display only the keywords with more than 10 hits?

cmadjar commented 3 years ago

Display total cumulative search keywords. Display only keywords searched more than 10 hits to start with.