elvisyjlin / media-scraper

Scrapes all photos and videos in a web page / Instagram / Twitter / Tumblr / Reddit / pixiv / TikTok
MIT License
371 stars 49 forks source link

Only downloads photos in Twitter #3

Closed Purefreeman closed 6 years ago

Purefreeman commented 6 years ago

As stated in the title, i can only download pictures. i download 4 different accounts and it still has the same result

elvisyjlin commented 6 years ago

Yes, as stated in the README, media-scraper cannot download the BLOB URL videos in Twitter currently. I'll keep working on it but no guarantee when to complete.

Purefreeman commented 6 years ago

Ok, sorry for the misunderstanding

elvisyjlin commented 6 years ago

Hi @Purefreeman, I've updated the mediascraper.twitter and now it supports downloading both photos and videos from Twitter.

Purefreeman commented 6 years ago

Tried multiple times with the new update, but i keep getting this error

File "C:\Users\dolap_000\AppData\Local\Programs\Python\Python36-32\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "C:\Users\dolap_000\Desktop\media-scraper-master\mediascraper\twitter.py", line 18, in tasks = scraper.scrape(username) File "C:\Users\dolap_000\Desktop\media-scraper-master\mediascrapers.py", line 398, in scrape img_url, vid_url = get_twitter_video_url(li['data-item-id']) File "C:\Users\dolap_000\Desktop\media-scraper-master\util\twitter.py", line 13, in get_twitter_video_url return post['posterImage'], post['track']['playbackUrl'].rsplit('?', 1)[0] KeyError: 'posterImage'

elvisyjlin commented 6 years ago

Could you please provide an example Twitter username so that I can reproduce the problem and fix it? I've tested media-scraper on the Twitter account realDonaldTrump which contains a bunch of photos and videos.

Purefreeman commented 6 years ago

python -m mediascraper.twitter purefreeman Starting PhantomJS web driver... Web driver ".\webdriver/phantomjsdriver_2.1.1_win32/phantomjs.exe" not found. Start downloading the web driver... Web driver ".\webdriver/phantomjsdriver_2.1.1_win32/phantomjs.exe" has been downloaded successfully. .\webdriver/phantomjsdriver_2.1.1_win32/phantomjs.exe C:\Users\dolap_000\AppData\Local\Programs\Python\Python36-32\lib\site-packages\selenium\webdriver\phantomjs\webdriver.py:49: UserWarning: Selenium support for PhantomJS has been deprecated, please use headless versions of Chrome or Firefox instead warnings.warn('Selenium support for PhantomJS has been deprecated, please use headless ' Logging in as "Kheshig"... Crawling... Traceback (most recent call last): File "C:\Users\dolap_000\AppData\Local\Programs\Python\Python36-32\lib\runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "C:\Users\dolap_000\AppData\Local\Programs\Python\Python36-32\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "C:\Users\dolap_000\Desktop\media-scraper-master\b\media-scraper-master\mediascraper\twitter.py", line 18, in tasks = scraper.scrape(username) File "C:\Users\dolap_000\Desktop\media-scraper-master\b\media-scraper-master\mediascrapers.py", line 398, in scrape img_url, vid_url = get_twitter_video_url(li['data-item-id']) File "C:\Users\dolap_000\Desktop\media-scraper-master\b\media-scraper-master\util\twitter.py", line 13, in get_twitter_video_url return post['posterImage'], post['track']['playbackUrl'].rsplit('?', 1)[0] KeyError: 'posterImage'

elvisyjlin commented 6 years ago

I've updated it to handle the json parsing exception. Please take a look.

Purefreeman commented 6 years ago

Yes it seems to be working. But it doesn't download some gifs

elvisyjlin commented 6 years ago

Could you give me an example? In my observation, the gifs are rendered as mp4s in Twitter posts and they are crawled and downloaded by mediascraper.

Purefreeman commented 6 years ago

I can't seem to find a good account to show this or recreate it. Thanks