Closed nixxo closed 3 years ago
please give an example with the url
and value of variables http_host
and qualities[i]
.
Without any additional info, my guess is that the variables have \
somewhere which is being interpreted by the regex as a reference
sry for the little infos... here's more:
so, the manifest url that works is
https://videodemand-vh.akamaihd.net/i/encoded/2020/11/22/1606032590423_uomo-ucciso-da-uno-squale-in-australia_,web_low,web_med,web_high,web_hd,.mp4.csmil/index_0_av.m3u8?null=0
and using the REPL_REGEX it "extracts" the tuple
#0 tuple(3)
[0] => str(72) "encoded/2020/11/22/1606032590423_uomo-ucciso-da-uno-squale-in-australia_"
[1] => str(31) "web_low,web_med,web_high,web_hd"
[2] => str(4) ".mp4"
generating the mp4 direct url
http://videoplatform.sky.it/encoded/2020/11/22/1606032590423_uomo-ucciso-da-uno-squale-in-australia_web_low.mp4
Instead the manifest that creates problem is:
https://gediusod-vh.akamaihd.net/i/repubblicatv/file/2020/09/22/731397/731397-video-rrtv-,650,200,400,1200,1800,2500,3500,4500,-s200922_iacoboni_salvini.mp4.csmil/index_3_av.m3u8?null=0
that generates the tuple
#0 tuple(3)
[0] => str(54) "repubblicatv/file/2020/09/22/731397/731397-video-rrtv-"
[1] => str(36) "650,200,400,1200,1800,2500,3500,4500"
[2] => str(29) "-s200922_iacoboni_salvini.mp4"
and the resulting mp4 url is wrong:
http://media.gedidigital.it/J00-s200922_iacoboni_salvini.mp4
but, like I said, if I do the same oparation just one step at a time it works. Only in the "condensed" way it generates problems.
ok, figured out the problem.
the replacement is
r'://%s/\1%s\3' % ( http_host, qualities[i] )
but if qualities is a number it is a problem because it becomes attatched to the \1 and becomes \1number and it fucks up the regex.
ok, solution found: https://github.com/ytdl-org/youtube-dl/commit/193422e12a98ebcc49a215cf3667c7fce593f25c#commitcomment-44741426
instead of \1 use \g<1>
Checklist
Verbose log
Description
i'm experimenting a bit with an extractor and I'm trying to use _extract_akamai_formats in common.py it basically takes the hls manifest url to recreate the http direct url for the mp4 of the file.
but it seems that some m3u8 manifest creates some problem with the re.sub function that I cannot understand.
the line that generates the problem is this one:
but if I recreate every step of the same line the code is executed without problems.
reading the traceback log is seems to me that it's a problem with the regex library. Can somebody explain it to me?