LMS-Community / lms-stats-service

A service to collect analytics data for the Lyrion Music Server Community
https://lyrion.org
1 stars 0 forks source link

Get rid of long tails in the data, and other suggestions #1

Closed michaelherger closed 5 months ago

michaelherger commented 5 months ago

Some feedback from the forums:

michaelherger commented 5 months ago

LMS side of things done in https://github.com/LMS-Community/slimserver/commit/7dbb684726a8663850882826190d847d6fa4e131

terual commented 5 months ago

With respect to "don't report single digit or <5 numbers" in case of OS for instance: maybe it is possible to lump them together to create a category Other?

And I think it would really be nice to have a time series chart for the LMS versions used. That way we can really see in the long term how quickly upgrades are picked up.

terual commented 5 months ago

SQlite example of what I mean:

BEGIN TRANSACTION;
CREATE TABLE IF NOT EXISTS "Test" (
    "ID"    INTEGER,
    "Category"  TEXT,
    PRIMARY KEY("ID" AUTOINCREMENT)
);
INSERT INTO "Test" VALUES (1,'Debian');
INSERT INTO "Test" VALUES (2,'Debian');
INSERT INTO "Test" VALUES (3,'Debian');
INSERT INTO "Test" VALUES (4,'Debian');
INSERT INTO "Test" VALUES (5,'Debian');
INSERT INTO "Test" VALUES (6,'Debian');
INSERT INTO "Test" VALUES (7,'Debian');
INSERT INTO "Test" VALUES (8,'Debian');
INSERT INTO "Test" VALUES (9,'Debian');
INSERT INTO "Test" VALUES (10,'macOS');
INSERT INTO "Test" VALUES (11,'macOS');
INSERT INTO "Test" VALUES (12,'macOS');
INSERT INTO "Test" VALUES (13,'macOS');
INSERT INTO "Test" VALUES (14,'Ubuntu');
INSERT INTO "Test" VALUES (15,'Ubuntu');
COMMIT;
SELECT 
    CASE 
        WHEN category_count > 5 THEN category
        ELSE 'Other'
    END AS category,
    SUM(category_count) AS category_count
FROM (
    SELECT 
        category,
        COUNT(*) AS category_count
    FROM Test
    GROUP BY category
) AS counts
GROUP BY 
    CASE 
        WHEN category_count > 5 THEN category
        ELSE 'Other'
    END;
michaelherger commented 5 months ago

I have a cron job taking snapshots of the data which includes LMS versions:

[
      {
        "date": "2024-04-11",
        "versions": "[{\"8.5.1\":37},{\"9.0.0\":7}]"
      },
      {
        "date": "2024-04-12",
        "versions": "[{\"8.5.1\":61},{\"9.0.0\":18},{\"8.5.0\":1}]"
      },
      {
        "date": "2024-04-13",
        "versions": "[{\"8.5.1\":61},{\"9.0.0\":18},{\"8.5.0\":1}]"
      }
    ]

But I doubt we'll be happy with Mermaid longer term...

terual commented 5 months ago

Migrate everything to https://timvink.github.io/mkdocs-charts-plugin/?

Edit: never mind, forgot we had the problem with the mkdocs privacy plugin

michaelherger commented 5 months ago

Basically... back to what you suggested a week or two ago? Yes, I think that looks more powerful.

Edit: with regards to privacy. Maybe we can figure this out. We can start with a POC diagram in a branch.

terual commented 5 months ago

I was thinking about creating a minimal mkdocs project where the problem occurs and try to work from there.

terual commented 5 months ago

I have found the culprit: the navigation.instant feature

terual commented 5 months ago

We can start with a POC diagram in a branch.

Done (for translations chart): https://github.com/LMS-Community/lms-community.github.io/tree/mkdocs-charts-plugin

sodface commented 5 months ago

Is the raw data available anywhere, and if not, can it be?

sodface commented 5 months ago

Here I guess: https://stats.lms-community.org/api/stats

michaelherger commented 5 months ago

As data is a sensitive topic, I wouldn't want to share really all raw data. As you've figured out there's an endpoint to get the aggregate data for the those charts. This obviously can be tweaked if needed. You might have seen that I've been tweaking it for a few days. And there's more data in there than the charts currently show (eg. the installations per LMS version).

michaelherger commented 5 months ago

BTW: I only cut off the plugins in that summary. For all others it's done in the script preparing the data for Lyrion.org. I did it that way because somebody mentioned a developer might not want to see a plugin under development show up in that list, or somebody might have their very own thing.

michaelherger commented 5 months ago

Oh, and now I have to check out your branch. Thanks for looking into this!

michaelherger commented 5 months ago

Nice! That's way more powerful and flexible! Here's one with multiple data rows, labels, and interactive popups to get details on some points:

Bildschirmfoto 2024-04-14 um 21 29 32
{
  "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
  "description": "LMS Installations by Version",
  "data": {"url": "/analytics/stats.json"},
  "encoding": {
    "x": {"field": "d", "type": "temporal", "title": "Date"},
    "y": {"field": "c", "type": "quantitative", "title": "Installations"},
    "color": {"field": "v", "type": "nominal", "title": "Version"}
  },
  "layer": [
    {
        "mark": {
            "type": "line",
            "point": {
                "filled": false,
                "fill": "white"
            }
        }
    },
    {
        "params": [{
            "name": "hover",
            "select": {"type": "point", "on": "pointerover", "clear": "pointerout"}
        }],
        "mark": {"type": "circle", "tooltip": true},
        "encoding": {
            "opacity": {
                "condition": {"test": {"param": "hover", "empty": false}, "value": 1},
                "value": 0
            },
            "size": {
                "condition": {"test": {"param": "hover", "empty": false}, "value": 48},
                "value": 100
            }
        }
    }]
}
[
    {
        "d": "2024-04-12",
        "v": "8.5.1",
        "c": 61
    },
    {
        "d": "2024-04-12",
        "v": "9.0.0",
        "c": 18
    },
    {
        "d": "2024-04-12",
        "v": "8.5.0",
        "c": 1
    },
    {
        "d": "2024-04-11",
        "v": "8.5.1",
        "c": 37
    },
    {
        "d": "2024-04-11",
        "v": "9.0.0",
        "c": 7
    },
    {
        "d": "2024-04-13",
        "v": "8.5.1",
        "c": 201
    },
    {
        "d": "2024-04-13",
        "v": "9.0.0",
        "c": 75
    },
    {
        "d": "2024-04-13",
        "v": "8.5.0",
        "c": 1
    },
    {
        "d": "2024-04-14",
        "v": "8.5.1",
        "c": 453
    },
    {
        "d": "2024-04-14",
        "v": "9.0.0",
        "c": 154
    },
    {
        "d": "2024-04-14",
        "v": "8.5.0",
        "c": 2
    }
]
terual commented 5 months ago

Looks really really good! If you are okay with the switch to vega-lite and with adjusting all the perl scripts to output json instead of yaml (those perl idiosyncrasies go way above my head), we can merge the mkdocs-charts-plugin branch.

michaelherger commented 5 months ago

Sure! Can you provide the MkDocs/Vega side of things, with some example JSON, and I'll tweak the script to spit out what you need?

terual commented 5 months ago

Perfect!

terual commented 5 months ago

I have merged into lyrion.org, but I'm still having problems which I thought I have fixed by removing the navigation.instant feature. I will look further into this before continuing with the analytics page.

terual commented 5 months ago

Weird... and now the translations chart works on lyrion.org. Same with you?

michaelherger commented 5 months ago

I'm seeing the following error in the browser's console:

[Warning] WARN – "Loading failed" – "https://lyrion.org/contributing/adding-translations/#current-coverage-of-translations/../..//contributing/strings-coverage.json" – SyntaxError: The string did not match the expected pattern. (vega@5.js, line 1) SyntaxError: The string did not match the expected pattern.json

michaelherger commented 5 months ago

Might be a caching issue: it's working now...

terual commented 5 months ago

Map of installs (world_map.json is generated from https://geojson-maps.kyd.au/):

{
  "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
  "width": 700,
  "height": 400,
  "title": {
    "text": "LMS installations worldwide",
    "fontSize": 20
  },
  "data": {
    "url": "analytics/world_map.json",
    "format": {"property": "features"}
  },
  "projection": {"type": "naturalEarth1"},
  "transform": [
    {
      "lookup": "properties.iso_a2",
      "from": {
        "key": "country",
        "fields": ["installs"],
        "data": {
          "url": "analytics/countries.csv",
          "format": {"type": "csv"}
        } 
      }
    }
  ],
  "mark": {
    "type": "geoshape",

    "stroke": "#141010",
    "strokeWidth": 0.5
  },
  "encoding": {
    "color": {
      "field": "installs",
      "type": "quantitative",
      "scale": {"scheme": "greens"},
      "legend": null
    },
    "tooltip": [
      {"field": "properties.name", "title": "Country"},
      {
        "field": "installs",
        "type": "quantitative",
        "title": "Installs"
      }
    ]
  },
  "config": {"mark": {"invalid": null}}
}

Example csv data:

country,installs
DE,266
US,150
GB,105
NL,63
FR,59
CH,42
AT,34
CA,25
SE,24
IT,19
BE,15
michaelherger commented 5 months ago

I can't get this to work. I don't even see the browser try to download the data files. Would you have a complete patch/PR with a working example?

terual commented 5 months ago

Yes, coming up

michaelherger commented 5 months ago

We should probably start a new task for this 😁. Another thought: CSV might be less data overhead, but I'd still prefer JSON, as that's pretty native to many tools. We already have JSON & YAML.

I was also wondering whether the world_map.json could be shrunk by removing the ton of unused data. But I didn't see many options on that GeoJSON service.

terual commented 5 months ago

I indeed chose csv because of the overhead, but json is fine by me. I am working on a branch with all the mockups for the charts and share the link here when ready.

terual commented 5 months ago

I had to reform the json a lot, I did not get the "hash" style of json to work. See here for the charts: https://github.com/LMS-Community/lms-community.github.io/tree/vegalite

michaelherger commented 5 months ago

Let's consider this done. Thanks!