BCCN-Prog / weather_2016

For the BCCN 2016 advanced programming project
3 stars 1 forks source link

Regarding the missing site info in the datasets #81

Open erensezener opened 8 years ago

erensezener commented 8 years ago

I have been told by @ge00rg and @clauslang that the data is only from one site in the daily DB. I don't see that this is the case, see the below snippet. But why are there sites like 0.5, 1.5 etc.? Is this expected?

h5 = h5py.File('daily_database.hdf5', 'r')
data = h5['weather_data'][:]
np.unique(data[:,1]) #since column 1 is site
>>> array([ 0. ,  0.5,  1. ,  2.5,  3.5,  4. ])
ge00rg commented 8 years ago

I don't remember the exact query we made...nor do I know why these are the indices, they should be contiguous itegers. Did syou test the other db? Can you try using the nquery engine on it and see wether you get plausible valuzes for get_data and/or get_val_range?

Am 2016-07-13 13:02, schrieb C. Eren Sezener:

I have been told by @ge00rg [1] and @clauslang [2] that the data is only from one site in the daily DB. I don't see that this is the case, see the below snippet. But why are there sites like 0.5, 1.5 etc.? Is this expected?

h5 = h5py.File('daily_database.hdf5', 'r') data = h5['weather_data'][:] np.unique(data[:,1]) #since column 1 is site

array([ 0. , 0.5, 1. , 2.5, 3.5, 4. ])

You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub [3], or mute the thread [4].

*

Links:

[1] https://github.com/ge00rg [2] https://github.com/clauslang [3] https://github.com/BCCN-Prog/weather_2016/issues/81 [4] https://github.com/notifications/unsubscribe/AP3d6OTn5BUWUrbtVjmLYDFY6geadbQ1ks5qVMW9gaJpZM4JLTY2

erensezener commented 8 years ago

The hourly DB sites are quite fucked up:

>>> h5 = h5py.File('hourly_database.hdf5', 'r'); data = h5['weather_data'][:]
>>> np.unique(data[:,2])
array([  0.00000000e+00,   1.00000000e+00,   4.00000000e+00,
         2.01606212e+11])

So we have sites 1 and 4 and a date (WTF?)

erensezener commented 8 years ago

Can you try using the nquery engine on it and see wether you get plausible valuzes for get_data and/or get_val_range?

The DB should be essentially the same. You can check it yourself.

erensezener commented 8 years ago

I am running all the scrapers again such that the outputs will be written to different DBs. Then all scraper authors must review their data since the DB will consist of only their own data.

denisalevi commented 8 years ago

If you want scraper authors to review their DB, please provide a clear code snippet, explaining how to access the data or use the query engine and where which data should be stored.