elvisyjlin / media-scraper

Scrapes all photos and videos in a web page / Instagram / Twitter / Tumblr / Reddit / pixiv / TikTok
MIT License
385 stars 49 forks source link

Selenium support for PhantomJS has been deprecated, #7

Open wankio opened 5 years ago

wankio commented 5 years ago

UserWarning: Selenium support for PhantomJS has been deprecated, please use headless versions of Chrome or Firefox instead

it happen when i'm using this python3 -m mediascraper.general [WEB PAGE 1] [WEB PAGE 2] ...

elvisyjlin commented 5 years ago

Hi, even there is a deprecation warning of phantomjs, you can still use it. However, I guess your problem is due to some bugs in mediascraper.general. I just fixed them. Please git pull and try again. Thank you! Let me know if your have other questions.

wankio commented 5 years ago

after that i have this error

Starting PhantomJS web driver... .\webdriver/phantomjsdriver_2.1.1_win32/phantomjs.exe C:\Users\GEN32UC\AppData\Local\Programs\Python\Python37\lib\site-packages\selenium\webdriver\phantomjs\webdriver.py:49: UserWarning: Selenium support for PhantomJS has been deprecated, please use headless versions of Chrome or Firefox instead warnings.warn('Selenium support for PhantomJS has been deprecated, please use headless ' 52 media are found. Downloading... 0%| | 0/52 [00:00<?, ?it/s] Traceback (most recent call last): File "C:\Users\GEN32UC\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "C:\Users\GEN32UC\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "I:\1_Command tools\media-scraper-master\media-scraper-master\mediascraper\general.py", line 14, in scraper.download(tasks=tasks, path='download/general') File "I:\1_Command tools\media-scraper-master\media-scraper-master\mediascrapers.py", line 107, in download download(url, path=target_path, rename=rename, replace=force) UnboundLocalError: local variable 'target_path' referenced before assignment

elvisyjlin commented 5 years ago

That's another bug due to no subfolder... I've just fixed it.

wankio commented 5 years ago

another error, can it have anyway to create subfolder in general with webpage title ? or domain/webpage title ? thank

Traceback (most recent call last):
  File "C:\Users\GEN32UC\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "C:\Users\GEN32UC\AppData\Local\Programs\Python\Python37\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "I:\1_Command tools\media-scraper-master\media-scraper-master\mediascraper\general.py", line 14, in <module>
    scraper.download(tasks=tasks, path='download/general')
  File "I:\1_Command tools\media-scraper-master\media-scraper-master\mediascrapers.py", line 108, in download
    download(url, path=target_path, rename=rename, replace=force)
  File "I:\1_Command tools\media-scraper-master\media-scraper-master\util\url.py", line 47, in download
    r = requests.get(url, stream=True)
  File "C:\Users\GEN32UC\AppData\Local\Programs\Python\Python37\lib\site-packages\requests\api.py", line 75, in get
    return request('get', url, params=params, **kwargs)
  File "C:\Users\GEN32UC\AppData\Local\Programs\Python\Python37\lib\site-packages\requests\api.py", line 60, in request
    return session.request(method=method, url=url, **kwargs)
  File "C:\Users\GEN32UC\AppData\Local\Programs\Python\Python37\lib\site-packages\requests\sessions.py", line 533, in request
    resp = self.send(prep, **send_kwargs)
  File "C:\Users\GEN32UC\AppData\Local\Programs\Python\Python37\lib\site-packages\requests\sessions.py", line 640, in send
    adapter = self.get_adapter(url=request.url)
  File "C:\Users\GEN32UC\AppData\Local\Programs\Python\Python37\lib\site-packages\requests\sessions.py", line 731, in get_adapter
    raise InvalidSchema("No connection adapters were found for '%s'" % url)
requests.exceptions.InvalidSchema: No connection adapters were found for 'data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIGhlaWdodD0iNDgwIiB3aWR0aD0iNjQwIiB2aWV3Qm94PSIwIDAgNjQwIDQ4MCI+ICA8ZGVmcz4gICAgPGNsaXBQYXRoIGlkPSJhIj4gICAgICA8cGF0aCBmaWxsLW9wYWNpdHk9Ii42NyIgZD0iTS04NS4zMzQgMGg2ODIuNjd2NTEyaC02ODIuNjd6Ii8+ICAgIDwvY2xpcFBhdGg+ICA8L2RlZnM+ICA8ZyBmaWxsLXJ1bGU9ImV2ZW5vZGQiIGNsaXAtcGF0aD0idXJsKCNhKSIgdHJhbnNmb3JtPSJ0cmFuc2xhdGUoODAuMDAxKSBzY2FsZSguOTM3NSkiPiAgICA8cGF0aCBmaWxsPSIjZWMwMDE1IiBkPSJNLTEyOCAwaDc2OHY1MTJoLTc2OHoiLz4gICAgPHBhdGggZD0iTTM0OS41OSAzODEuMDVsLTg5LjU3Ni02Ni44OTMtODkuMTM3IDY3LjU1IDMzLjE1Mi0xMDkuNzctODguOTczLTY3Ljc4NCAxMTAuMDgtLjk0NSAzNC4xNDItMTA5LjQ0IDM0Ljg3MyAxMDkuMTkgMTEwLjA4LjE0NC04OC41MTcgNjguNDIzIDMzLjg4NCAxMDkuNTN6IiBmaWxsPSIjZmYwIi8+ICA8L2c+PC9zdmc+'
Exception ignored in: <function tqdm.__del__ at 0x0000028023C979D8>
Traceback (most recent call last):
  File "C:\Users\GEN32UC\AppData\Local\Programs\Python\Python37\lib\site-packages\tqdm\_tqdm.py", line 931, in __del__
    self.close()
  File "C:\Users\GEN32UC\AppData\Local\Programs\Python\Python37\lib\site-packages\tqdm\_tqdm.py", line 1133, in close
    self._decr_instances(self)
  File "C:\Users\GEN32UC\AppData\Local\Programs\Python\Python37\lib\site-packages\tqdm\_tqdm.py", line 496, in _decr_instances
    cls.monitor.exit()
  File "C:\Users\GEN32UC\AppData\Local\Programs\Python\Python37\lib\site-packages\tqdm\_monitor.py", line 52, in exit
    self.join()
  File "C:\Users\GEN32UC\AppData\Local\Programs\Python\Python37\lib\threading.py", line 1029, in join
    raise RuntimeError("cannot join current thread")
RuntimeError: cannot join current thread
elvisyjlin commented 5 years ago
  1. I've modified the mediascraper.general to save media in folder named as the page title.
  2. I cannot reproduce your error. Would you mind providing me more information to get those error?
sintaxx commented 3 years ago

is this project still active? i'm also having similar issues, i can paste output if i get a response