BCCN-Prog / weather_2016

For the BCCN 2016 advanced programming project
3 stars 1 forks source link

Run accuweather scraping #78

Open denisalevi opened 8 years ago

denisalevi commented 8 years ago

Well @erensezener I couldn't pull from the server since you have merge conflicts... I don't want to mess around with your stuff, so please run my scraper after pulling the new version. But please run it like this:

assertion_count = 0
unicode_count = 0
others_count = 0
file_count = 0
for day, month, city in itertools.product(days, months, cities):
    date_string = '{}-{}-{}'.format(day, month, year)

    try:        
        sc_ac(date_string, city, DATAPATH)
    except Exception as ex:
        if ex == AssertionError: assertion_count += 1
        elif ex == UnicodeDecodeError: unicode_count += 1
        else: others_count += 1
        print('Excepted {} in accuweather'.format(type(ex).__name__))
        print(traceback.print_exc())        
        with open("accuweather_excepted_errors.txt", "a") as myfile:
            myfile.write('Excepted {} for date={}, city={}:\n{}\n\n'.format(type(ex).__name__, date_string, city, ex))
    file_count += 1

tot = assertion_count + unicode_count + others_count
print('\n\nFinished saving ACCUWEATHER data fo database\n Excepted errors: {}/{}\n\tAssertionErrors: {}\n\t UnicodeDecodeErrors: {}\n\t Other errors: {}\ndetails saved in accuweather_excepted_errors.txt'.format(tot, file_count, assertion_count, unicode_count, others_count))

with open("accuweather_excepted_errors.txt", "a") as myfile:
    myfile.write('\n\nFinished saving ACCUWEATHER data fo database\n Excepted errors: {}/{}\n\tAssertionErrors: {}\n\t UnicodeDecodeErrors: {}\n\t Other errors: {}\ndetails saved in accuweather_excepted_errors.txt'.format(tot, file_count, assertion_count, unicode_count, others_count))

This excepts all errors and creates a file accuweather_excepted_errors.txt with details about them. Could you please post that file here after running the scraper? Thank you.

erensezener commented 8 years ago

Ok I am running it now. It will probably take at least couple of hours.

erensezener commented 8 years ago

It took ~ 5 hours, here are the outputs: https://drive.google.com/file/d/0BwQc_CC3arWWOE5YZVpZT1pQQTQ/view?usp=sharing

If it looks ok, please close the issue.