Closed wjt closed 1 year ago
I wrote:
Can someone more experienced than me check how many files and how much data this corresponds to for all the thumbnails on key.endlessos.org (or give me a crash course on the Kolibri data model and how to get an interactive Python console with the Django ORM models loaded? :) )
@manuq wrote:
My pleasure! The tricky part is to have the corresponding nodes in the database:
from kolibri.core.content.models import ContentNode total = 0 for c in ContentNode.objects.all(): f = next((f for f in c.files.all() if f.thumbnail), None) if f is not None: total += f.get_file_size() print(total)
To start having an idea I ran the above with my current database which has a fresh artist-0001 just imported. I also calculated the thumbnails of available content only, by changing the above to
ContentNode.objects.filter(available=True)
:
- total thumbs size = 142.9 Megabytes (142954746 bytes) for all 955 nodes in all 10 channels imported by artist-0001.json
- available thumbs size = 5.2 Megabytes (5249466 bytes) for 58 available nodes
The real measure would be to import all metadata for each JSON files that represent the EK collection and filter by them. Something like:
ek_node_ids = ['1520f018610256549c98ca0140cceebe', 'deb6566eede6513c9f262f367c2b5f8d', ...] ContentNode.objects.filter(id__in=ek_node_ids)
I ran the following script on the prod key instance:
#!/usr/bin/env python3
from collections import defaultdict
from kolibri.core.content.models import File, ChannelMetadata
from operator import itemgetter
def nice_size(num):
for unit in ('bytes','KB','MB','GB'):
if num < 1024.0:
return "%3.1f %s" % (num, unit)
num /= 1024.0
return "%3.1f %s" % (num, 'TB')
total = 0
channels = defaultdict(int)
for thumbnail in File.objects.filter(thumbnail=True):
size = thumbnail.local_file.file_size
total += size
channels[thumbnail.contentnode.channel_id] += size
nice_total = nice_size(total)
print(f'Total {nice_total} ({total})')
for channel_id, thumbnail_size in sorted(channels.items(), key=itemgetter(1), reverse=True):
channel_name = ChannelMetadata.objects.get(id=channel_id).name
channel_nice = nice_size(thumbnail_size)
print(f'{channel_id} ({channel_name}) {channel_nice} ({thumbnail_size})')
And I ran that by having it be read on stdin like kolibri manage shell < sizes.py
when run as the kolibri
user with all the environment variables from /etc/default/kolibri
set.
Here's what it came up with:
Total 1.7 GB (1844833202)
c9d7f950ab6b5a1199e3d6c10d7f0103 (Khan Academy (English - US curriculum)) 1.1 GB (1186440792)
7aca54975a2c415c888d5fe73e0e8163 (हिन्दी) 166.5 MB (174574651)
59b8deeb90f544da923187e77c8d3820 (wikiHow) 88.1 MB (92409113)
914fee213ee146de869016c287116b23 (Chapter Books) 55.2 MB (57849018)
000409f81dbe5d1ba67101cb9fed4530 (Touchable Earth (en)) 50.4 MB (52894914)
bbb4ea407a3c450cb18cbaa76f2d75cd (CSpathshala (English)) 47.5 MB (49830241)
08897e003ea9489eb3d86fc94ba08c21 (Українською) 22.6 MB (23665950)
74f36493bb475b62935fa8705ed59fed (Thoughtful Learning) 20.8 MB (21826123)
f061fce103ff5d4e9b8433e67802e666 (Arts & Crafts) 20.3 MB (21326248)
79cd09863eed51e98576c35ede6f9c9d (Cooking) 16.0 MB (16797114)
fc47aee82e0153e2a30197d3fdee1128 (Open Stax) 15.4 MB (16113723)
2f95235c3709511fa12d007f31ed6a7b (STEAM) 9.3 MB (9803758)
efcc464be5a85ba5a58d1636b00313fc (Gardening) 9.1 MB (9556010)
f5f6729f95b55753badeaa066fa6e986 (Healthy Body) 7.6 MB (7921762)
e9d0d54d209344849e9bed0aa8c222ad (Sikana DIY) 7.4 MB (7737800)
3fcffebc58d15175b948b140434ef6e6 (Sports) 7.2 MB (7531679)
0418cc231e9c5513af0fff9f227f7172 (Free English with Hello Channel) 7.0 MB (7367609)
97111903de564de49483a9705d41a8ac (Career Girls) 6.1 MB (6359663)
ee52db4a62a94e9683599af8782f2d03 (The SciGirls Collection (en español)) 5.5 MB (5807639)
1b1fc9bd453a4c52bb5628d9ae804ede (The SciGirls Collection) 5.5 MB (5782572)
92e96efc082e5c62b0aac3847bdcdb33 (Staff Playlist) 4.7 MB (4940529)
e11462f71c6f5472b113311c69071b05 (Dance) 4.7 MB (4934302)
197934f144305350b5820c7c4dd8e194 (PhET Interactive Simulations (English)) 4.3 MB (4508692)
1520f018610256549c98ca0140cceebe (Virtual Field Trips) 4.0 MB (4198784)
359e048230974c8f80db1a95dc80d544 (EiE Familias) 3.9 MB (4092851)
9c33eb395508447d96c96682cb18c57a (Techbridge Girls @ Home) 3.6 MB (3802707)
f1ada7abc4194ff48a958337a31972c7 (EiE Families) 3.6 MB (3749048)
bcc6e12a0ddf4a17a8b600c6b880e3ed (Common Sense Student Resources) 3.3 MB (3499386)
2091ca47ff544c96b4ae02b3a92346e1 (TED-Ed) 3.1 MB (3298810)
bf0260ed911f44cda27a263db93a8512 (49ers EDU Digital Playbook) 2.6 MB (2697563)
4968191fba07548c9592fc174a70b5d6 (Beauty) 2.5 MB (2610982)
57e23812e0dc562581958e39acedd717 (Games & Gaming) 2.5 MB (2573844)
e409b964366a59219c148f2aaa741f43 (Blockly Games) 2.2 MB (2260272)
4e413158eac55422a5343af9fcfa8d59 (Healthy Mind) 2.1 MB (2162902)
2b43973f53f1538bad5ece63ad847606 (Financial Literacy) 2.0 MB (2143450)
3160899a73564d8a8467284d9219b91c (Terminal Two) 2.0 MB (2124581)
057f871caa405ec29d62ba0523c193d7 (Music) 2.0 MB (2072904)
bf36d8e7e1ee56b194fe52cafbfd9db3 (Fashion) 1.8 MB (1863063)
a8e6591f1afa426d859318a0a29d1237 (SAMHSA) 1.5 MB (1587918)
eb4373b5da054c07879d0c969dc1976a (Virtual Science Teachers) 1.2 MB (1281591)
b40491d1ef8b5506b8c6ae861372e9de (Jewelry Making) 1.1 MB (1191929)
79a50be66bad5eb686c42617c914fd45 (Careers) 908.4 KB (930183)
85b42a40745f4e2392ed62e72d4dad6e (OceanX) 616.0 KB (630786)
f62db29be20453c4a267132e93a9e602 (Wikipedia) 77.9 KB (79746)
Note that I did not filter on available
as the current expectation is that we'd want to ship all the content thumbnails for a channel so that the full channel can be browsed.
I started looking at how we would ingest all the thumbnails for a channel rather than just the thumbnails for the desired content nodes. It looks like it will need some work. Kolibri's importing works at the content node level, but thumbnails are a level below content nodes. Kolibri has no "import just the thumbnail for a content node but not the actual content" knob. I think there are 3 options:
all_thumbnails
option to the Kolibri's content import methods ASAP. This has to be wired all the way from the API interface down to the file selection function.I asked on Slack if the all_thumbnails
feature would be acceptable and Richard said yes, so I started working on that. I'm a bit bogged down in testing but it doesn't seem too hard.
I opened https://github.com/learningequality/kolibri/issues/10770 upstream. https://github.com/learningequality/kolibri/compare/develop...dbnicholson:kolibri:all-thumbnails has my WIP branch, but I'm still working through some test failures.
PR ready upstream https://github.com/learningequality/kolibri/pull/10780.
The app has been updated to Kolibri v0.16.0-alpha15. Ready to make use of the _allthumbnails feature.
I suggest to go back to the initial PR that was just adding the "all_thumbnails" option so we can merge it now. Doing it in the background requires a bit more work in the frontent, not just the backend.
Cleanup merged.
I suggest to go back to the initial PR that was just adding the "all_thumbnails" option so we can merge it now. Doing it in the background requires a bit more work in the frontent, not just the backend.
Sorry, I missed this message. I can still do that and punt on backgrounding, but I think the current iteration of #584 works nicely minus the frontend integration.
Deployed:
kolibri-explore-plugin v6.17.0 Android internal testers: Ladybird 6.17-348 Windows alpha test flight: v6.17.0
Depending on how fast you press the "show me" button after the content has been downloaded during the onboarding the thumbnails might be there or not. The download of the thumbnails might still be ongoing. There is no way to tell the user "new thumbnails, wanna refresh?" yet - this is covered https://github.com/orgs/endlessm/projects/3/views/8?pane=issue&itemId=31379558
In Kolibri's terms, these are not part of the channel's metadata but are content items in their own right.
When we download metadata for additional collections besides the one the user picks (#548) we'll then want to download thumbnails for the content in those collections, so that users can explore it (#545) visually.