Open stefco opened 3 years ago
I have attempted to run the added function download_carousel
with the following Google Instagram post, but come across a TypeError.
from instascrape import *
from pathlib import Path
def insta_scrape(ig_links):
for link in ig_links:
post = link.split("?")[0] if "copy_link" in link else link
post_folder = Path(post.split("/")[-2])
post_folder.mkdir(parents=True, exist_ok=True)
google_post = Post(post)
google_post.download_carousel(str(post_folder), allow_non_carousel=True)
if __name__ == "__main__":
link_list = list()
ig_link = "https://www.instagram.com/p/CXuAeZ1ltCa/?utm_source=ig_web_copy_link"
link_list.append(ig_link)
insta_scrape(link_list)
Running the following code above, I get the following Traceback:
$ py insta_scrape.py
Traceback (most recent call last):
File "insta_scrape.py", line 18, in <module>
insta_scrape(link_list)
File "insta_scrape.py", line 11, in insta_scrape
google_post.download_carousel(str(post_folder), allow_non_carousel=True)
File "...\insta_download\py_venv\lib\site-packages\instascrape\scrapers\post.py", line 213, in download_carousel
urls = self.parse_carousel_urls()
File "..\insta_download\py_venv\lib\site-packages\instascrape\scrapers\post.py", line 157, in parse_carousel_urls
is_videos = self._filter_get(self.flat_json_dict, self._IS_VIDEO_KEYS)
File "..\insta_download\py_venv\lib\site-packages\instascrape\scrapers\post.py", line 133, in _filter_get
return [(k, dic[k]) for k in keys if k in dic]
File "..\insta_download\py_venv\lib\site-packages\instascrape\scrapers\post.py", line 133, in <listcomp>
return [(k, dic[k]) for k in keys if k in dic]
TypeError: argument of type 'NoneType' is not iterable
(py_venv)
Stepping through with the debugger, I noticed that the variable self.flat_json_dict
has the value of None
. I am not sure if anyone else has come across such an error.
I had forgotten to perform the scrape
method. With the new line, the function now operates as intended.
from instascrape import *
from pathlib import Path
def insta_scrape(ig_links):
for link in ig_links:
post = link.split("?")[0] if "copy_link" in link else link
post_folder = Path(post.split("/")[-2])
post_folder.mkdir(parents=True, exist_ok=True)
google_post = Post(post)
google_post.scrape()
google_post.download_carousel(str(post_folder), allow_non_carousel=True)
if __name__ == "__main__":
link_list = list()
ig_link = "https://www.instagram.com/p/CXuAeZ1ltCa/?utm_source=ig_web_copy_link"
link_list.append(ig_link)
insta_scrape(link_list)
Description
Commits fb0c968 and b51f725:
Adds a
download_carousel
method forPost
s which allows you to download all media on carousel posts, i.e. posts with multiple images/videos, as raised in #105. Since this is a batch operation, you specify an output directory and a function for calculating the filename for each output image instead of specifying a single output filename. See the method documentation for details.Also added a couple of supporting methods, though only one of them,
parse_carousel_urls
, is public; this method simply returns the video and image URLs for each image in the carousel, orNone
if the post is not a carousel. Again, see docstring for details.Also added the beginnings of a demo jupyter notebook.
Fixes #105
Commit e88032e:
Post.get_recent_comments would raise a KeyError when using Selenium or a requests.Session object to scrape a Post due to slight differences in the structure of the resulting json_dict. I added an except block to handle this and try the alternative json_dict schema.
Fixes #124
Commit f443435:
Add
Profile.iter_posts
to get a lazy iterator over posts, and reimplementProfile.get_posts
(with the same API) usingiter_posts
.Fixes #127
Checklist
Additional notes (optional)
Have not written automated tests yet, will do so soon.