Closed SlowLogicBoy closed 5 years ago
If I remember correctly, they are cut off by YouTube and the href
is rewritten to go through YouTube's redirect.
I see, I will need to debug this to see if I could get out of that something, if not oh well.
void EscapeYoutubeHyperlinks(IElement descriptionNode)
{
var hyperLinks = descriptionNode.GetElementsByTagName("a");
foreach (var hyperlink in hyperLinks.Cast<IHtmlAnchorElement>())
{
if (string.IsNullOrWhiteSpace(hyperlink.Search))
continue;
var queryParts = hyperlink.Search.Split('&', StringSplitOptions.RemoveEmptyEntries);
var url = queryParts.SingleOrDefault(s => s.StartsWith("q="))?.Substring(2);
if(string.IsNullOrWhiteSpace(url))
url = queryParts.SingleOrDefault(s => s.StartsWith("?q="))?.Substring(3);
if (string.IsNullOrWhiteSpace(url))
continue;
url = Uri.UnescapeDataString(url);
hyperlink.Href = url;
hyperlink.Dataset["url"] = url;
}
}
//Usage:
var descriptionNode = watchPage.QuerySelector("p#eow-description");
EscapeYoutubeHyperlinks(descriptionNode);
this changes from:
<a href="/redirect?redir_token=y3v1wFxmzoIFfMuVsvc86NSD7UF8MTUzOTMzMDQ3M0AxNTM5MjQ0MDcz&q=http%3A%2F%2Freol.jp&event=video_description&v=EFTV3IIjeNw" class="yt-uix-sessionlink " data-target-new-window="True" data-sessionlink="itct=CDQQ6TgYACITCKXvvoHz_d0CFdOdwQodFM0L1Sj4HUjc8Y2RyLu1qhA" data-url="/redirect?redir_token=y3v1wFxmzoIFfMuVsvc86NSD7UF8MTUzOTMzMDQ3M0AxNTM5MjQ0MDcz&q=http%3A%2F%2Freol.jp&event=video_description&v=EFTV3IIjeNw" target="_blank" rel="nofollow noopener">http://reol.jp</a>
Into:
<a href="http://reol.jp" class="yt-uix-sessionlink " data-target-new-window="True" data-sessionlink="itct=CDQQ6TgYACITCKXvvoHz_d0CFdOdwQodFM0L1Sj4HUjc8Y2RyLu1qhA" data-url="http://reol.jp" target="_blank" rel="nofollow noopener">http://reol.jp</a>
How is this link rendered now, without the proposed solution?
The current link is:
/redirect?redir_token=y3v1wFxmzoIFfMuVsvc86NSD7UF8MTUzOTMzMDQ3M0AxNTM5MjQ0MDcz&q=http%3A%2F%2Freol.jp&event=video_description&v=EFTV3IIjeNw
which is as you said youtube redirect link.
So I convert from that url to: http://reol.jp
because they save the original url in q=http%3A%2F%2Freol.jp
and I just do Url Decode which changes to q=http://reol.jp
.
Can you try and see if it works on all videos? Particularly interested in videos with really long links in the description. I remember there were two types of encoding that they used.
From this video: https://www.youtube.com/watch?v=2mmZZEUbM4I By using that code I got:
descriptionNode.GetElementsByTagName("a").Cast<IHtmlAnchorElement>().Select(a => a.Href).ToList():
[0] [string]:"https://djs3rl.com/shop/Like-This"
[1] [string]:"https://itunes.apple.com/au/album/like-this-feat.-krystal-single/id1196696115"
[2] [string]:"https://play.spotify.com/album/0q4Y9yzlqlTwchSYC6VzVq"
[3] [string]:"https://play.google.com/store/music/album?id=Bauz32emv4rdpfyo3xhcxtqpa6e&tid=song-Tmfsybxlezihqdx4nr5d25orj3e"
[4] [string]:"http://classic.beatport.com/release/like-this-dj-edit/1933795"
[5] [string]:"https://www.trackitdown.net/track/s3rl-feat-krystal/like-this-dj-edit/hardcore/11083573.html"
[6] [string]:"https://soundcloud.com/s3rl/like-this-s3rl-feat-krystal"
[7] [string]:"https://osu.ppy.sh/s/566554"
[8] [string]:"https://www.youtube.com/user/SlenderTheMan22"
[9] [string]:"https://www.youtube.com/channel/UCv2mQRbD_rWDMbIL-scMcgw"
[10] [string]:"https://www.instagram.com/kazumi_mai/"
[11] [string]:"https://djs3rl.com/"
There were some cut off urls, but since I decode redirects, I get full urls. Note:
I'm trying to refactor the architecture a bit to make it easier to implement this. https://github.com/Tyrrrz/YoutubeExplode/tree/refactor-parsers work in progress
Done. The format hasn't changed, but the links are now never cut off in description.
For example in this video description there are quite a few links but they are cut off, I would like to somehow get those hyperlinks. Something like
string RawDescription
property? without that.TextEx()
thingy? Raw Html is fine with me.