4pr0n / ripme

Downloads albums in bulk
MIT License
912 stars 205 forks source link

tumblr post with inline images only returns non-inline images #143

Closed nameplate1 closed 9 years ago

nameplate1 commented 9 years ago

Example: this post rips a single photo, but the remaining 3 are not pulled.

4pr0n commented 9 years ago

Here's exactly what the API is returning for that post:

{

    "meta": {
        "status": 200,
        "msg": "OK"
    },
    "response": {
        "blog": {
            "title": "▽IαнƒУ▽",
            "name": "iahfy",
            "posts": 2162,
            "url": "http://iahfy.tumblr.com/",
            "updated": 1420951773,
            "description": "<center>\"Ї αм ℌεґε ƒøґ Уøʊ\"",
            "is_nsfw": false,
            "ask": true,
            "ask_page_title": "Ask me anything",
            "ask_anon": true,
            "share_likes": true,
            "likes": 1702
        },
        "posts": [
            {
                "blog_name": "iahfy",
                "id": 103668250722,
                "post_url": "http://iahfy.tumblr.com/post/103668250722/i-brought-you-some-tea-i-thought-you-might-be",
                "slug": "i-brought-you-some-tea-i-thought-you-might-be",
                "type": "photo",
                "date": "2014-11-26 21:50:38 GMT",
                "timestamp": 1417038638,
                "state": "published",
                "format": "html",
                "reblog_key": "zrJbnaLb",
                "tags": [
                    "korrasami",
                    "korra",
                    "asami",
                    "legend of korra",
                    "lok",
                    "korra you dork",
                    "asami can see right through you",
                    "iahfyart",
                    "redraw"
                ],
                "short_url": "http://tmblr.co/ZLBsam1WZ6-XY",
                "highlighted": [ ],
                "note_count": 10350,
                "caption": "<blockquote>\n<p><strong><em><span> </span></em></strong><em><span>”</span></em><em><span>I brought you some tea</span><span>. I thought you might be cold out here”</span></em></p>\n</blockquote>\n<p><em><span><img alt=\"\" src=\"https://31.media.tumblr.com/bd07d980eede42bb0bbcb18850748c46/tumblr_inline_nfngdc3O3d1t0qgmr.jpg\"/></span></em></p>\n\n<p><img alt=\"\" src=\"https://31.media.tumblr.com/0004fb38d7a76aeefc0beb97cc5248ca/tumblr_inline_nfngdpw6LP1t0qgmr.jpg\"/></p>\n<p><img alt=\"\" src=\"https://31.media.tumblr.com/e39cf62adc91d0e4824b004f7821ee35/tumblr_inline_nfnge0GttL1t0qgmr.jpg\"/></p>\n<p><strike>well played korra</strike></p>",
                "image_permalink": "http://iahfy.tumblr.com/image/103668250722",
                "photos": [
                    {
                        "caption": "",
                        "alt_sizes": [
                            {
                                "width": 800,
                                "height": 602,
                                "url": "http://40.media.tumblr.com/b4ce56d2c36f03fa56ef613cd0166e91/tumblr_nfngt5hdmO1tl5boeo1_1280.jpg"
                            },
                            {
                                "width": 500,
                                "height": 376,
                                "url": "http://40.media.tumblr.com/b4ce56d2c36f03fa56ef613cd0166e91/tumblr_nfngt5hdmO1tl5boeo1_500.jpg"
                            },
                            {
                                "width": 400,
                                "height": 301,
                                "url": "http://41.media.tumblr.com/b4ce56d2c36f03fa56ef613cd0166e91/tumblr_nfngt5hdmO1tl5boeo1_400.jpg"
                            },
                            {
                                "width": 250,
                                "height": 188,
                                "url": "http://40.media.tumblr.com/b4ce56d2c36f03fa56ef613cd0166e91/tumblr_nfngt5hdmO1tl5boeo1_250.jpg"
                            },
                            {
                                "width": 100,
                                "height": 75,
                                "url": "http://41.media.tumblr.com/b4ce56d2c36f03fa56ef613cd0166e91/tumblr_nfngt5hdmO1tl5boeo1_100.jpg"
                            },
                            {
                                "width": 75,
                                "height": 75,
                                "url": "http://41.media.tumblr.com/b4ce56d2c36f03fa56ef613cd0166e91/tumblr_nfngt5hdmO1tl5boeo1_75sq.jpg"
                            }
                        ],
                        "original_size": {
                            "width": 800,
                            "height": 602,
                            "url": "http://40.media.tumblr.com/b4ce56d2c36f03fa56ef613cd0166e91/tumblr_nfngt5hdmO1tl5boeo1_1280.jpg"
                        }
                    }
                ]
            }
        ],
        "total_posts": 1
    }

}

RipMe looks at all photos under photos and grabs the original_size URL & downloads it.

This tumblr post only has 1 photo, but the description (caption) includes HTML tags (<img>) to embed more images.

If this is a widespread thing on tumblr, I can look into scraping images from the description. I'm worried it will end up downloading a ton of tiny "smilie" / emoticon images, or other irrelevant images.

If you have more examples of this besides that one post, let me know and I'll look.