Benny- / Yahoo-ticker-symbol-downloader

A web scraper for ticker symbols from yahoo finance
https://pypi.python.org/pypi/Yahoo-ticker-downloader/
Other
250 stars 87 forks source link

downloading interruption #7

Closed depend closed 8 years ago

depend commented 9 years ago

After running it for a while I got this error:

Traceback (most recent call last): File "D:\Python27\Scripts\YahooTickerDownloader.py", line 140, in main() File "D:\Python27\Scripts\YahooTickerDownloader.py", line 97, in main downloadEverything(downloader, tickerType, args.insecure) File "D:\Python27\Scripts\YahooTickerDownloader.py", line 51, in downloadEvery thing print (" " + unicode(symbols[1])) UnicodeEncodeError: 'gbk' codec can't encode character u'\xc3' in position 27: i llegal multibyte sequence

If I restart it, it will go further before throwing another one. Is this just Yahoo issue or something else?

underdpt commented 9 years ago

I've also got this error. I think it comes when the scripts tries to output some results from the latest requests and chokes on some characters from the stock name (for example, in this output the error happens just when printing the second Stock):

req https://finance.yahoo.com/lookup/?s=d&r=&m=ALL&t=S&b=1100
Got 20 downloaded Stock symbols:
 Stock DRYN PNK DRAYTON RICHDALE NEW
A exception occurred while downloading. Suspending downloader to disk
Successfully saved download state
Remove downloader.pickle if this error persists
Issues can be reported on https://github.com/Benny-/Yahoo-ticker-symbol-downloader/issues

Traceback (most recent call last):
  File "YahooTickerDownloader.py", line 139, in <module>
    main()
  File "YahooTickerDownloader.py", line 96, in main
    downloadEverything(downloader, tickerType, args.insecure)
  File "YahooTickerDownloader.py", line 50, in downloadEverything
    print (" " + unicode(symbols[1]))
  File "C:\Python27\lib\encodings\cp850.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\x81' in position 59: character maps to <undefined>
alanbertadev commented 9 years ago

There is a work around for this. If you are on linux you can set these environment variables:

LANGUAGE=en_US.UTF-8 LC_ALL=en_US.UTF-8

underdpt commented 9 years ago

Hi,

I'm on windows. After a few hours trying to make the decoding/encoding working i gave up and end catching that exception and letting it pass:

try:
    print (" " + unicode(symbols[0]))
    print (" " + unicode(symbols[1]))
    print ("  ect...")
except:
    pass

I think the encoding on the Yahoo database isn't standarized. For example the BR symbol BEES4.SA looks like:

Banestes S.A - Banco do Estado do Espírito Santo S.A. (BEES4.SA)

But if you take a look at SDIL11.SA:

SDI Logística Rio - Fundo de Investimento Imobiliário - FII (SDIL11.SA)

So here the accented chars looks right.

I think if I'm going to use Yahoo's finance data, it may need a character encoding detection somewhere :-(

Benny- commented 8 years ago

I think the problem lies in the systems default char encoding. If it is not set to UTF it will not be possible to display all unicode characters in the output device and hence the error. It tries to encode unicode into something for which there is no proper mapping.

I consider @alanbertadev's solution to be the best solution.

But there might be valid reasons to have a odd char encoding, so I will implement @underdpt's solution into master:

            try:
                print (" " + unicode(symbols[0]))
                print (" " + unicode(symbols[1]))
                print ("  ect...")
            except:
                print (" Could not display some ticker symbols due to char encoding")
Benny- commented 8 years ago

This issue is better explained @ #9