chris-greening / instascrape

Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically
https://chris-greening.github.io/instascrape/
MIT License
635 stars 111 forks source link

UnicodeEncodeError when using Post.to_csv if emoji in caption #31

Closed chris-greening closed 4 years ago

chris-greening commented 4 years ago

Describe the bug If there is an emoji in the caption of a post that has been scraped, it will throw a UnicodeEncodeError when attempting to use Post.to_csv instance method to write the data to .csv.

To Reproduce

from instascrape import Post 
url = 'https://www.instagram.com/p/CGa0nQBljxN/'
post = Post(url)
post.load()
post.to_csv('test.csv')

and this raises

D:\Programming\pythonstuff\instascrape\instascrape\scrapers\post.py in to_csv(self, fp)
     42         # have to convert to serializable format
     43         self.upload_date = datetime.datetime.timestamp(self.upload_date)
---> 44         super().to_csv(fp=fp)
     45         self.upload_date = datetime.datetime.fromtimestamp(self.upload_date)
     46

D:\Programming\pythonstuff\instascrape\instascrape\core\_static_scraper.py in to_csv(self, fp)
     86             writer = csv.writer(csv_file)
     87             for key, value in self.to_dict().items():
---> 88                 writer.writerow([key, value])
     89
     90     def to_json(self, fp: str) -> None:

~\Anaconda3\lib\encodings\cp1252.py in encode(self, input, final)
     17 class IncrementalEncoder(codecs.IncrementalEncoder):
     18     def encode(self, input, final=False):
---> 19         return codecs.charmap_encode(input,self.errors,encoding_table)[0]
     20
     21 class IncrementalDecoder(codecs.IncrementalDecoder):

UnicodeEncodeError: 'charmap' codec can't encode character '\u2728' in position 224: character maps to <undefined>

Expected behavior It's expected that this would write the scraped data to a .csv file

Desktop (please complete the following information):

satyabansahoo2000 commented 4 years ago

@chris-greening post.to_csv('test.csv', header=True, index=False, encoding='utf-8') I guess this might help. This will encode all the texts into utf-8 format.

Devarsh-leo commented 4 years ago

Can I send pull request for this bug?

chris-greening commented 4 years ago

@Devarsh-leo go for it!!!!