Closed atticus-sullivan closed 2 years ago
Not quite sure if the deduplication later is still necessary (thus I left it there up to now).
If I am not mistaken converting this to a set would erase the order we have collected the urls in. This would make the order of download random.
Yes that's right. To me the order of the download doesn't matter as typically I try to stay up to date with the videos, so most of the time there is only one or two videos to download.
The background behind the desire to avoid capturing the m3u8 urls multiple times is that in the case of panopto and with #6 this takes quite a while (some sleeps).
Is there any reason why the download order would be important?
Note:
I also implemented some caching (storing some sort of map video_id
-> download url
and title
) in this regard (no quite ready for a PR) since some lectures upload many videos on panopto at once and link them only week for week on moodle (so checking for/downloading new videos takes quite some time when all m3u8 urls of the already downloaded videos have to be captured).
As some lecturers have published different lectures with identical titles, we require the order to either rename the duplicate or, as we do it now, number all lectures in sequence for them to be sortable in folders.
Note that we must not at any point rely on a filename / title for separating playlist files or skip acquiring a URL based on a filename, as they are not unique.
This is why the de-duplication currently happens after we have acquired all the .m3u8 links, as these are required to identify unique lectures.
Note that we must not at any point rely on a filename / title for separating playlist files or skip acquiring a URL based on a filename, as they are not unique. But I think the video_id should be, right? Since by using the
set
deduplication is done based on thevideo_id
s.
But I agree this doesn't solve the issue regarding enumerating files with the same title in the right order.
Deduplication has been optimized in the latest release. URLs are now deduplicated in an order preserving way before they are accessed.
Currently as far as I can see deduplication is only done after collecting the m3u8 links. It is more efficient to deduplicate the
video_urls
retrieved from the folder view (by doing so no video m3u8 is searched for multiple times)