endlessm / kolibri-explore-plugin

The kolibri plugin to add the custom channel representation
MIT License
2 stars 4 forks source link

Download thumbnails for content that is not available locally #549

Closed wjt closed 1 year ago

wjt commented 1 year ago

In Kolibri's terms, these are not part of the channel's metadata but are content items in their own right.

When we download metadata for additional collections besides the one the user picks (#548) we'll then want to download thumbnails for the content in those collections, so that users can explore it (#545) visually.

wjt commented 1 year ago

I wrote:

Can someone more experienced than me check how many files and how much data this corresponds to for all the thumbnails on key.endlessos.org (or give me a crash course on the Kolibri data model and how to get an interactive Python console with the Django ORM models loaded? :) )

@manuq wrote:

My pleasure! The tricky part is to have the corresponding nodes in the database:

from kolibri.core.content.models import ContentNode
total = 0
for c in ContentNode.objects.all():
    f = next((f for f in c.files.all() if f.thumbnail), None)
    if f is not None:
        total += f.get_file_size()
print(total)

To start having an idea I ran the above with my current database which has a fresh artist-0001 just imported. I also calculated the thumbnails of available content only, by changing the above to ContentNode.objects.filter(available=True):

  • total thumbs size = 142.9 Megabytes (142954746 bytes) for all 955 nodes in all 10 channels imported by artist-0001.json
  • available thumbs size = 5.2 Megabytes (5249466 bytes) for 58 available nodes

The real measure would be to import all metadata for each JSON files that represent the EK collection and filter by them. Something like:

ek_node_ids = ['1520f018610256549c98ca0140cceebe', 'deb6566eede6513c9f262f367c2b5f8d', ...]
ContentNode.objects.filter(id__in=ek_node_ids)
dbnicholson commented 1 year ago

I ran the following script on the prod key instance:

#!/usr/bin/env python3

from collections import defaultdict
from kolibri.core.content.models import File, ChannelMetadata
from operator import itemgetter

def nice_size(num):
    for unit in ('bytes','KB','MB','GB'):
        if num < 1024.0:
            return "%3.1f %s" % (num, unit)
        num /= 1024.0
    return "%3.1f %s" % (num, 'TB')

total = 0
channels = defaultdict(int)
for thumbnail in File.objects.filter(thumbnail=True):
    size = thumbnail.local_file.file_size
    total += size
    channels[thumbnail.contentnode.channel_id] += size

nice_total = nice_size(total)
print(f'Total {nice_total} ({total})')
for channel_id, thumbnail_size in sorted(channels.items(), key=itemgetter(1), reverse=True):
    channel_name = ChannelMetadata.objects.get(id=channel_id).name
    channel_nice = nice_size(thumbnail_size)
    print(f'{channel_id} ({channel_name}) {channel_nice} ({thumbnail_size})')

And I ran that by having it be read on stdin like kolibri manage shell < sizes.py when run as the kolibri user with all the environment variables from /etc/default/kolibri set.

Here's what it came up with:

Total 1.7 GB (1844833202)
c9d7f950ab6b5a1199e3d6c10d7f0103 (Khan Academy (English - US curriculum)) 1.1 GB (1186440792)
7aca54975a2c415c888d5fe73e0e8163 (हिन्दी) 166.5 MB (174574651)
59b8deeb90f544da923187e77c8d3820 (wikiHow) 88.1 MB (92409113)
914fee213ee146de869016c287116b23 (Chapter Books) 55.2 MB (57849018)
000409f81dbe5d1ba67101cb9fed4530 (Touchable Earth (en)) 50.4 MB (52894914)
bbb4ea407a3c450cb18cbaa76f2d75cd (CSpathshala (English)) 47.5 MB (49830241)
08897e003ea9489eb3d86fc94ba08c21 (Українською) 22.6 MB (23665950)
74f36493bb475b62935fa8705ed59fed (Thoughtful Learning) 20.8 MB (21826123)
f061fce103ff5d4e9b8433e67802e666 (Arts & Crafts) 20.3 MB (21326248)
79cd09863eed51e98576c35ede6f9c9d (Cooking) 16.0 MB (16797114)
fc47aee82e0153e2a30197d3fdee1128 (Open Stax) 15.4 MB (16113723)
2f95235c3709511fa12d007f31ed6a7b (STEAM) 9.3 MB (9803758)
efcc464be5a85ba5a58d1636b00313fc (Gardening) 9.1 MB (9556010)
f5f6729f95b55753badeaa066fa6e986 (Healthy Body) 7.6 MB (7921762)
e9d0d54d209344849e9bed0aa8c222ad (Sikana DIY) 7.4 MB (7737800)
3fcffebc58d15175b948b140434ef6e6 (Sports) 7.2 MB (7531679)
0418cc231e9c5513af0fff9f227f7172 (Free English with Hello Channel) 7.0 MB (7367609)
97111903de564de49483a9705d41a8ac (Career Girls) 6.1 MB (6359663)
ee52db4a62a94e9683599af8782f2d03 (The SciGirls Collection (en español)) 5.5 MB (5807639)
1b1fc9bd453a4c52bb5628d9ae804ede (The SciGirls Collection) 5.5 MB (5782572)
92e96efc082e5c62b0aac3847bdcdb33 (Staff Playlist) 4.7 MB (4940529)
e11462f71c6f5472b113311c69071b05 (Dance) 4.7 MB (4934302)
197934f144305350b5820c7c4dd8e194 (PhET Interactive Simulations (English)) 4.3 MB (4508692)
1520f018610256549c98ca0140cceebe (Virtual Field Trips) 4.0 MB (4198784)
359e048230974c8f80db1a95dc80d544 (EiE Familias) 3.9 MB (4092851)
9c33eb395508447d96c96682cb18c57a (Techbridge Girls @ Home) 3.6 MB (3802707)
f1ada7abc4194ff48a958337a31972c7 (EiE Families) 3.6 MB (3749048)
bcc6e12a0ddf4a17a8b600c6b880e3ed (Common Sense Student Resources) 3.3 MB (3499386)
2091ca47ff544c96b4ae02b3a92346e1 (TED-Ed) 3.1 MB (3298810)
bf0260ed911f44cda27a263db93a8512 (49ers EDU Digital Playbook) 2.6 MB (2697563)
4968191fba07548c9592fc174a70b5d6 (Beauty) 2.5 MB (2610982)
57e23812e0dc562581958e39acedd717 (Games & Gaming) 2.5 MB (2573844)
e409b964366a59219c148f2aaa741f43 (Blockly Games) 2.2 MB (2260272)
4e413158eac55422a5343af9fcfa8d59 (Healthy Mind) 2.1 MB (2162902)
2b43973f53f1538bad5ece63ad847606 (Financial Literacy) 2.0 MB (2143450)
3160899a73564d8a8467284d9219b91c (Terminal Two) 2.0 MB (2124581)
057f871caa405ec29d62ba0523c193d7 (Music) 2.0 MB (2072904)
bf36d8e7e1ee56b194fe52cafbfd9db3 (Fashion) 1.8 MB (1863063)
a8e6591f1afa426d859318a0a29d1237 (SAMHSA) 1.5 MB (1587918)
eb4373b5da054c07879d0c969dc1976a (Virtual Science Teachers) 1.2 MB (1281591)
b40491d1ef8b5506b8c6ae861372e9de (Jewelry Making) 1.1 MB (1191929)
79a50be66bad5eb686c42617c914fd45 (Careers) 908.4 KB (930183)
85b42a40745f4e2392ed62e72d4dad6e (OceanX) 616.0 KB (630786)
f62db29be20453c4a267132e93a9e602 (Wikipedia) 77.9 KB (79746)

Note that I did not filter on available as the current expectation is that we'd want to ship all the content thumbnails for a channel so that the full channel can be browsed.

dbnicholson commented 1 year ago

I started looking at how we would ingest all the thumbnails for a channel rather than just the thumbnails for the desired content nodes. It looks like it will need some work. Kolibri's importing works at the content node level, but thumbnails are a level below content nodes. Kolibri has no "import just the thumbnail for a content node but not the actual content" knob. I think there are 3 options:

  1. Add an all_thumbnails option to the Kolibri's content import methods ASAP. This has to be wired all the way from the API interface down to the file selection function.
  2. Provide our own import API handler in the explore plugin that duplicates Kolibri's remote download manager and file selection function.
  3. Provide an out of band method in the explore plugin for fetching all the content thumbnails for a given channel and then manually import them into the Kolibri database.
dbnicholson commented 1 year ago

I asked on Slack if the all_thumbnails feature would be acceptable and Richard said yes, so I started working on that. I'm a bit bogged down in testing but it doesn't seem too hard.

dbnicholson commented 1 year ago

I opened https://github.com/learningequality/kolibri/issues/10770 upstream. https://github.com/learningequality/kolibri/compare/develop...dbnicholson:kolibri:all-thumbnails has my WIP branch, but I'm still working through some test failures.

dbnicholson commented 1 year ago

PR ready upstream https://github.com/learningequality/kolibri/pull/10780.

erikos commented 1 year ago

The app has been updated to Kolibri v0.16.0-alpha15. Ready to make use of the _allthumbnails feature.

manuq commented 1 year ago

I suggest to go back to the initial PR that was just adding the "all_thumbnails" option so we can merge it now. Doing it in the background requires a bit more work in the frontent, not just the backend.

manuq commented 1 year ago

Cleanup merged.

dbnicholson commented 1 year ago

I suggest to go back to the initial PR that was just adding the "all_thumbnails" option so we can merge it now. Doing it in the background requires a bit more work in the frontent, not just the backend.

Sorry, I missed this message. I can still do that and punt on backgrounding, but I think the current iteration of #584 works nicely minus the frontend integration.

manuq commented 1 year ago

Deployed:

kolibri-explore-plugin v6.17.0 Android internal testers: Ladybird 6.17-348 Windows alpha test flight: v6.17.0

erikos commented 1 year ago

Depending on how fast you press the "show me" button after the content has been downloaded during the onboarding the thumbnails might be there or not. The download of the thumbnails might still be ongoing. There is no way to tell the user "new thumbnails, wanna refresh?" yet - this is covered https://github.com/orgs/endlessm/projects/3/views/8?pane=issue&itemId=31379558