Closed Hrxn closed 5 years ago
Here a small Tumblr blog that I used for testing: http://broken-embeds-test.tumblr.com/ Contains 4 videos, embedded from Instagram, and 4 pictures (posted in Tumblr as link, posting as image doesn't work straightforwardly, apparently).
Here's the comparison between the commits with the changes to the function. https://github.com/bbolli/tumblr-utils/compare/2c92d2f816ab34ab595d6a2c3defb5bd4525d3b9...1d3b15fec0609f1258d305fff7de95a9e441cc67
The result for 1d3b15fec0609f1258d305fff7de95a9e441cc67
D:\Etc\TUMBLR\1d3b15f>D:\Inst\Python\python.exe D:\Src\tumblr-utils-1d3b15fec0609f1258d305fff7de95a9e441cc67\tumblr_backup.py --save-video broken-embeds-test.tumblr.com
WARNING: Falling back on generic information extractor.r.com: 0 remaining posts to save
WARNING: Falling back on generic information extractor.
WARNING: Falling back on generic information extractor.
WARNING: Falling back on generic information extractor.
Unable to download video in post #151625452707
Unable to download video in post #151625405267
Unable to download video in post #151625431177
Unable to download video in post #151625377647
broken-embeds-test.tumblr.com: 8 posts backed up
D:\Etc\TUMBLR\1d3b15f>
The result for 2c92d2f816ab34ab595d6a2c3defb5bd4525d3b9
D:\Etc\TUMBLR\2c92d2f>D:\Inst\Python\python.exe D:\Src\tumblr-utils-2c92d2f816ab34ab595d6a2c3defb5bd4525d3b9\tumblr_backup.py --save-video broken-embeds-test.tumblr.com
WARNING: Falling back on generic information extractor.r.com: broken-embeds-test.tumblr.com: 0 remaining posts to save
WARNING: Falling back on generic information extractor.
WARNING: Falling back on generic information extractor.
WARNING: Falling back on generic information extractor.
WARNING: Falling back on generic information extractor.
WARNING: Falling back on generic information extractor.
WARNING: Falling back on generic information extractor.
WARNING: Falling back on generic information extractor.
broken-embeds-test.tumblr.com: 8 posts backed up
D:\Etc\TUMBLR\2c92d2f>
(I also have the console output with 'quiet': 'False'
in the YoutubeDL properties if necessary, but it only says that it's failing, not why...)
Comparing the result:
D:\Etc\TUMBLR>dir 1d3b15f\broken-embeds-test.tumblr.com\media
Volume in drive D is Home
Volume Serial Number is 8851-7591
Directory of D:\Etc\TUMBLR\1d3b15f\broken-embeds-test.tumblr.com
File Not Found
D:\Etc\TUMBLR>dir 2c92d2f\broken-embeds-test.tumblr.com\media
Volume in drive D is Home
Volume Serial Number is 8851-7591
Directory of D:\Etc\TUMBLR\2c92d2f\broken-embeds-test.tumblr.com\media
10.10.2016 23:27 <DIR> .
10.10.2016 23:27 <DIR> ..
19.09.2016 09:38 4.733.902 BKh4Z19g6TB_waldbaumalex_Video_by_waldbaumalex.mp4
19.09.2016 10:17 2.068.892 BKh89aYAEqB_waldbaumalex_Video_by_waldbaumalex.mp4
19.09.2016 10:14 4.438.481 BKh8l8dAr67_waldbaumalex_Video_by_waldbaumalex.mp4
19.09.2016 10:12 1.909.457 BKh8TtDgZmG_waldbaumalex_Video_by_waldbaumalex.mp4
4 File(s) 13.150.732 bytes
2 Dir(s) 18.384.117.760 bytes free
D:\Etc\TUMBLR>
As you can see, 1d3b15f doesn't has the media
subdir, while 2c92d2f has a media
subdir with 4 .mp4 files inside..
@bbolli would be interesting to know if this issue only happens on Windows..
Anyway, the culprit seems to be that media_filename = ydl.prepare_filename(result)
, the old variant, can download the files, where media_filename = sanitize_filename(filetmpl % result['entries'][0], restricted=True)
doesn't work anymore..
I'm using Ubuntu 14 LTS.
This error only happens when --save-video
is enable. Because neither youtube-dl
nor tumblr_backup
doesn't support Instagran yet. Please improve this.
This error only happens when --save-video is enable.
Yes. That's exactly the problem, usage of youtube-dl
within tumblr_backup
, obviously.
And youtube-dl
does support Instagram:
https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/instagram.py
Can you provide some example Instagram links to reproduce the problem? We'll see if my theory is right..
Can you provide some example Instagram links to reproduce the problem?
Try this:
tumblr_backup.py --save-video --save-audio -k --image-names -N 0 ablogthathasaninstagrampostonit
Be sure youtube-dl
is installed via pip
.
Result of the Tumblr API for
ablogthathasaninstagrampostonit
{ "meta": { "status": 404, "msg": "Not Found" }, "response": [] }
You sure that's the right one?
@Hrxn
You sure that's the right one?
no. to be honest, it's honkawa (NSFW blog)
Ah, okay.
This blog is working fine, but where exactly are the Instagram videos that return an error?
@Hrxn sorry for late respond.
youtube-dl
is installed via pip install youtube_dl
so, there's no issue about out-to-date version.
For error, here: https://asciinema.org/a/advempwlelwwpj7w8eqhbubwh
That term was recorded via CodeEnvy, remotely, because it tooks hours to download in my location.
Okay.. I think I see the issue here..
For example, some posts that return the ERROR: Unable to extract video url; please report [...]
Taken from your log:
https://honkawa.tumblr.com/post/127313935430
https://honkawa.tumblr.com/post/127308001840
https://honkawa.tumblr.com/post/127301058340
https://honkawa.tumblr.com/post/127231379175
https://honkawa.tumblr.com/post/127282439995
https://honkawa.tumblr.com/post/127245103290
https://honkawa.tumblr.com/post/127207461220
https://honkawa.tumblr.com/post/127214144045
https://honkawa.tumblr.com/post/127154233910
https://honkawa.tumblr.com/post/127154217275
These are all pictures hosted on Instagram, not videos. (Not even really NSFW considering Instagrams policies).
And youtube-dl doesn't work here, it returns the error mentioned in your log, because it only accepts videos at the moment.
Every Tumblr post belongs to a certain type (Tumblr Dashboard now shows Text, Photo, Quote, Link, Chat, Auto, Video, for example.)
In the past, it was possible to select Photo, and then select 'add photo from URL' and use the link to an Instagram post here. You now had a photo post, picture linking to that Instagram post, but the photo was also on Tumblr, could be backed up by tumblr-utils etc. I assume this was only introduced pretty recently, since Instagram added these multi-page/multi-photo posts maybe.
This doesn't seem to work any longer. You now have to use the Link type. Or, ironically, and that is also what your example blog (honkawa) is doing, use the post type Video > Add video from web And youtube-dl doesn't work here, as mentioned..
Example here: https://embedded-demos.tumblr.com/ vs https://embedded-demos.tumblr.com/archive
@bbolli Can you reproduce? Any ideas here, how to work around this kind of "type confusion"?
Sucks that Instagram photos also don't work any longer with tumblr-utils.
Although, the picture still appears to be there as before:
E:\Test\Test>curl -s -o "1.jpg" https://68.media.tumblr.com/f1b98d4636c658d2558eaea7e5615ae7/tumblr_oo5392llzc1wn82de_og_1280.jpg
E:\Test\Test>curl -s -o "2.jpg" https://scontent.cdninstagram.com/t51.2885-15/e35/17125916_1832179880333322_5336448482473410560_n.jpg
SHA256 hash of file 1.jpg:
cd 7a d2 11 d8 6e ec 61 be 4f 08 6b ba 09 b1 f6 e2 83 23 a0 b2 14 5e d7 18 ad 56 51 16 d9 93 c9
CertUtil: -hashfile command completed successfully.
SHA256 hash of file 2.jpg:
cd 7a d2 11 d8 6e ec 61 be 4f 08 6b ba 09 b1 f6 e2 83 23 a0 b2 14 5e d7 18 ad 56 51 16 d9 93 c9
CertUtil: -hashfile command completed successfully.
Don't know about these new multi-page posts, though..
There's nothing tumblr-backup can do about these kinds of JavaScript-infested posts. The basic problem is that each platform is building silos in which they try to keep their content only to themselves.
I guess your best bet is to write a patch to youtube-dl that can extract the images from the Instagram embeds.
Okay... Decided to look into this again, with my own test case..
Let me clean up the old stuff here first..