elvisyjlin / media-scraper

Scrapes all photos and videos in a web page / Instagram / Twitter / Tumblr / Reddit / pixiv / TikTok
MIT License
385 stars 49 forks source link

Instagram scraper not working #10

Closed archmord closed 5 years ago

archmord commented 5 years ago

When I try to scrape Instagram account PS C:\Users\User\media-scraper> python -m mediascraper.instagram sigridupdating Starting PhantomJS web driver... .\webdriver/phantomjsdriver_2.1.1_win32/phantomjs.exe C:\Python\Python37\lib\site-packages\selenium\webdriver\phantomjs\webdriver.py:49: UserWarning: Selenium support for PhantomJS has been deprecated, please use headless versions of Chrome or Firefox instead warnings.warn('Selenium support for PhantomJS has been deprecated, please use headless ' Crawling... Traceback (most recent call last): File "C:\Python\Python37\lib\runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "C:\Python\Python37\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "C:\Users\User\media-scraper\mediascraper\instagram.py", line 16, in <module> tasks = scraper.scrape(username) File "C:\Users\User\media-scraper\mediascrapers.py", line 238, in scrape data = self.getJsonData(username) File "C:\Users\User\media-scraper\mediascrapers.py", line 227, in getJsonData content = self._driver.find_element_by_tag_name('pre').text File "C:\Python\Python37\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 530, in find_element_by_tag_name return self.find_element(by=By.TAG_NAME, value=name) File "C:\Python\Python37\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 978, in find_element 'value': value})['value'] File "C:\Python\Python37\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute self.error_handler.check_response(response) File "C:\Python\Python37\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.NoSuchElementException: Message: {"errorMessage":"Unable to find element with tag name 'pre'","request":{"headers":{"Accept":"application/json","Accept-Encoding":"identity","Content-Length":"90","Content-Type":"application/json;charset=UTF-8","Host":"127.0.0.1:54137","User-Agent":"selenium/3.141.0 (python windows)"},"httpVersion":"1.1","method":"POST","post":"{\"using\": \"tag name\", \"value\": \"pre\", \"sessionId\": \"70cf5fb0-6c21-11e9-975b-e51852ab1518\"}","url":"/element","urlParsed":{"anchor":"","query":"","file":"element","directory":"/","path":"/element","relative":"/element","port":"","host":"","password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/element","queryKey":{},"chunks":["element"]},"urlOriginal":"/session/70cf5fb0-6c21-11e9-975b-e51852ab1518/element"}} Screenshot: available via screen Any idea what is causing this

elvisyjlin commented 5 years ago

The old fashion python3 -m mediascraper.instagram sigridupdating is out of date. Please git pull and try python3 m-scraper.py rq instagram sigridupdating. Thank you.

archmord commented 5 years ago

Ok so it worked but the file format is .net

elvisyjlin commented 5 years ago

Have you git pull again? I should have fixed it.

archmord commented 5 years ago

Thanks, it worked and even download videoes but I notice that it even download thumbnails of videos too

elvisyjlin commented 5 years ago

That is how it works. In case someone wants the thumbnails. You can just delete them.