BCCN-Prog / weather_2016

For the BCCN 2016 advanced programming project
3 stars 1 forks source link

Some html files result in UnicodeDecodeError when red by BeautifulSoup #60

Closed denisalevi closed 8 years ago

denisalevi commented 8 years ago

I couldn't find a solution, but it seems that the error only appears for 3 files (from all files downloaded so far) and all of them from the same download time. So I guess if nobody has already encountered and fixed this problem, just remove following files from the server:

accuweather_01-06-2016_17\:07_frankfurt_daily_d1_1464793657.html
accuweather_01-06-2016_17\:07_frankfurt_daily_d4_1464793657.html
accuweather_01-06-2016_17\:07_frankfurt_daily_d5_1464793657.html
erensezener commented 8 years ago

But did I give you all of the accuweather data? or just a sample?

erensezener commented 8 years ago

Yes, there are other files with the same problem. Can you handle this in your code?

denisalevi commented 8 years ago

I think you gave me all accuweather data last week. I will try to handle it in my script, later today.

denisalevi commented 8 years ago

Are the files you get the error for new files from last week?

erensezener commented 8 years ago

No, for instance accuweather_02-06-2016_17:07_bielefeld_daily_d5_1464880053.html

denisalevi commented 8 years ago

Why should I handle this in my script? Don't you except errors anyways and when error occurs, its just not gonna save it to the database. There is not more I can do anyways.

And since the scraped data still has some files which will give AssertionErrors (from my checks), e.g. the files from april which are france cities instead of german ones (we didnt delete those), we will have to except assertion errors anyways. I will write a log file where alle excepted errors are logged, so we can see what happens.

denisalevi commented 8 years ago

Should be handled in your script as shown in #78