assert youtube link format before youtube_dl

HarHarLinks commented 3 years ago

Fixes #47

I think this should work... I tested:


[u[0] for u in
re.findall("(https?://(www\.)?(youtube\.com/(watch\?[a-zA-Z0-9=\&]*v=|embed/)|youtu.be/)[a-zA-Z0-9]{11})",
'bit of trash text youtube.com/feed/subscriptions
https://www.youtube.com/watch?v=ch69W2l1Mak <iframe width="1869"
height="763" src="https://www.youtube.com/embed/ch69W2l1Mak"
title="YouTube video player" frameborder="0" allow="accelerometer;
autoplay; clipboard-write; encrypted-media; gyroscope;
picture-in-picture" allowfullscreen></iframe>
https://youtu.be/ch69W2l1Mak?t=10 https://www.youtube.com/watch?v=ch69W
http://youtube.com/watch?v=ch69W2l1Mak')]

['https://www.youtube.com/watch?v=ch69W2l1Mak', 'https://www.youtube.com/embed/ch69W2l1Mak', 'https://youtu.be/ch69W2l1Mak', 'http://youtube.com/watch?v=ch69W2l1Mak']



- trash text is ignored
- subscriptions etc ignored
- proper https desktop link matches
- random embed html ignored
- but proper embed link extracted
- youtu.be extracted, ignoring additional arguments
- broken link (too short ID) ignored
- match even without https and www

I also updated the other embed regex to improve the matching accuracy regarding broken IDs.

Feel free to edit if you disagree with any of these cases. E.g. the if else could be removed altogether, but I suppose using the embed only is more accurate. Or perhaps only the second if only embeds are needed?

Romern commented 3 years ago

nice thx

HarHarLinks commented 3 years ago

me every time you merge my PRs 😅

Romern / syncMyMoodle

assert youtube link format before youtube_dl #66