chris-greening / instascrape

Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically
https://chris-greening.github.io/instascrape/
MIT License
630 stars 107 forks source link

Broken on Python3.8.5 / WSL2 / Ubuntu 20.04 #56

Closed emilheunecke closed 3 years ago

emilheunecke commented 3 years ago
>>> google = Profile('https://www.instagram.com/google/')
>>> google.scrape()

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/emil/.local/lib/python3.8/site-packages/instascrape/core/_static_scraper.py", line 110, in scrape
    self.json_dict = self._get_json_from_source(self.source, headers=headers)
  File "/home/emil/.local/lib/python3.8/site-packages/instascrape/core/_static_scraper.py", line 206, in _get_json_from_source
    json_dict_str = self._json_str_from_soup(self.soup)
  File "/home/emil/.local/lib/python3.8/site-packages/instascrape/core/_static_scraper.py", line 237, in _json_str_from_soup
    json_script = [str(script) for script in soup.find_all("script") if "config" in str(script)][0]
IndexError: list index out of range
chris-greening commented 3 years ago

Hello! I'm not able to reproduce this error, were you able to use the library at all prior to this or did this occur as soon as you started?

emilheunecke commented 3 years ago

Strange! It occurred as soon as I started.

chris-greening commented 3 years ago

Does printing google.html give you anything? If so can you post the result here?

I had this exact same problem about a week ago because Instagram started hitting me with 429 HTTP status codes on every single request, I ended up fixing it by passing in a proper default User-Agent in the request header but it seems like you might be getting the same problem

emilheunecke commented 3 years ago

Also tested on Python 3.9.1 on Debian Buster using Docker:

emil@MatebookD:~$ docker run -it python:3.9.1-buster bash
Unable to find image 'python:3.9.1-buster' locally
3.9.1-buster: Pulling from library/python
6c33745f49b4: Pull complete
c87cd3c61e27: Pull complete
05a3c799ec37: Pull complete
a61c38f966ac: Pull complete
c2dd6d195b68: Pull complete
29b9446ae7bd: Pull complete
09cf96c794f9: Pull complete
f674fd97fba7: Pull complete
9c7f9d05b1c1: Pull complete
Digest: sha256:341cf29e353c5ae49f1972e6472cbd0cd5ed3b2984c5c353167d331eca679827
Status: Downloaded newer image for python:3.9.1-buster
root@0faaafa4a2ed:/# pip3 install insta-scrape
Collecting insta-scrape
  Downloading insta_scrape-1.6.1-py3-none-any.whl (26 kB)
Collecting beautifulsoup4
  Downloading beautifulsoup4-4.9.3-py3-none-any.whl (115 kB)
     |████████████████████████████████| 115 kB 2.6 MB/s
Collecting soupsieve>1.2
  Downloading soupsieve-2.1-py3-none-any.whl (32 kB)
Collecting requests
  Downloading requests-2.25.1-py2.py3-none-any.whl (61 kB)
     |████████████████████████████████| 61 kB 1.3 MB/s
Collecting certifi>=2017.4.17
  Downloading certifi-2020.12.5-py2.py3-none-any.whl (147 kB)
     |████████████████████████████████| 147 kB 2.9 MB/s
Collecting chardet<5,>=3.0.2
  Downloading chardet-4.0.0-py2.py3-none-any.whl (178 kB)
     |████████████████████████████████| 178 kB 4.2 MB/s
Collecting idna<3,>=2.5
  Downloading idna-2.10-py2.py3-none-any.whl (58 kB)
     |████████████████████████████████| 58 kB 1.8 MB/s
Collecting urllib3<1.27,>=1.21.1
  Downloading urllib3-1.26.2-py2.py3-none-any.whl (136 kB)
     |████████████████████████████████| 136 kB 3.9 MB/s
Installing collected packages: urllib3, soupsieve, idna, chardet, certifi, requests, beautifulsoup4, insta-scrape
Successfully installed beautifulsoup4-4.9.3 certifi-2020.12.5 chardet-4.0.0 idna-2.10 insta-scrape-1.6.1 requests-2.25.1 soupsieve-2.1 urllib3-1.26.2
root@0faaafa4a2ed:/# python3
Python 3.9.1 (default, Dec 12 2020, 13:15:12)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from instascrape import *
>>> google = Profile('https://www.instagram.com/google/')
>>> google.scrape()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.9/site-packages/instascrape/core/_static_scraper.py", line 110, in scrape
    self.json_dict = self._get_json_from_source(self.source, headers=headers)
  File "/usr/local/lib/python3.9/site-packages/instascrape/core/_static_scraper.py", line 206, in _get_json_from_source
    json_dict_str = self._json_str_from_soup(self.soup)
  File "/usr/local/lib/python3.9/site-packages/instascrape/core/_static_scraper.py", line 237, in _json_str_from_soup
    json_script = [str(script) for script in soup.find_all("script") if "config" in str(script)][0]
IndexError: list index out of range
emilheunecke commented 3 years ago
>>> google.html
'Oops, an error occurred.\n'
chris-greening commented 3 years ago

Hmmm I've never seen that before and I've seen quite a lot of Instagram errors

okay last two questions lol, can you try requests.get('https://www.instagram.com/google/') and print the result? Also are you able to login normally in your browser to Instagram?

emilheunecke commented 3 years ago

What do you know, reboot fixed it.

chris-greening commented 3 years ago

lol when in doubt, reboot! Awesome, thanks for update 😎