BCCN-Prog / weather_2016

For the BCCN 2016 advanced programming project
3 stars 1 forks source link

Validate accuweather data #83

Open erensezener opened 8 years ago

erensezener commented 8 years ago

Please download the data with _aw suffix here: https://drive.google.com/folderview?id=0BwQc_CC3arWWMTNYaEpCOHlKZmc&usp=sharing

And look at nanmax(), unique() etc of columns and the number of entries to see if it makes sense.

erensezener commented 8 years ago

It works like this:

import h5py
import numpy as np

>>> h5 = h5py.File('hourly_database.hdf5', 'r'); data = h5['weather_data'][:]
>>> np.unique(data[:,2])
array([  0.00000000e+00,   1.00000000e+00,   4.00000000e+00,
         2.01606212e+11])

Beware that the data is padded with rows of zero from the bottom.

denisalevi commented 8 years ago

I have checked the data using nanmax(), values are reasonable. Temperature, cloud_cover and station_id are zero (since they are not included in the daily database I tested on)

I didn't get what information unique() should give me? I get only NaNs.

I have also looked through the excepted errors and fixed some of them. So if the data is used and you have time tonight, you can rerun it. But it will only add data from ~60 / 3000 html files, which are not included in the current database since they through errors before. There are still 184 / 3000 html files which give an UnicodeDecodeError. Nothing I can do about that right now. And there are 304 / 3000 files not included because of french city forecasts being downloaded instead of germany cities. And to my surprise that was happening for an entire month (around 28.4. - 28.5.)... So for that period there is no data.

If you run the scraper again, can you change the ex == ...Error to type(ex) == ...Error, then the error counter works properly:

    try:        
        sc_ac(date_string, city, DATAPATH)
    except Exception as ex:
        if type(ex)== AssertionError: assertion_count += 1
        elif type(ex) == UnicodeDecodeError: unicode_count += 1
erensezener commented 8 years ago

Did you push your changes?

erensezener commented 8 years ago

I pulled some stuff but I haven't noticed your changes on the file. Maybe I have overlooked

denisalevi commented 8 years ago

I thought I pushed changes. But it only changed my accuweather/functions.py file. You didnt get those?

On 19 Jul 2016, at 12:55, C. Eren Sezener notifications@github.com wrote:

I pulled some stuff but I haven't noticed your changes on the file. Maybe I have overlooked

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub, or mute the thread.

erensezener commented 8 years ago

I pulled and started running it an hour ago. So results should be ready soon.