fsfrazao / DegDay

Degrees per day project for CMSC6950 course
0 stars 0 forks source link

On the Url download code #1

Open djo504 opened 8 years ago

djo504 commented 8 years ago

The issue right now is.....the first task seems complicated more than I had imagined and I would love that we all work on the code if possible. In the meantime I will commit what I have done in a bit

fsfrazao commented 8 years ago

That's ok. Let's look at your code and see what we can do.

djo504 commented 8 years ago

The code is right here. And I think the plotting guys can carry on with the plot while we get the code to work considering the tight deadline. It is possible. I am thinking we could make some calls on google hangout too so we do not have to wait till Monday or Wednesdays if need be for more effective physical discussions which I think is really needed at this phase

sfatima1 commented 8 years ago

Hi, I found this link (http://pandas.pydata.org/pandas-docs/stable/10min.html ) which is about pandas library of python which is useful for reading .csv files with date frames and applying function (in our case GDD calculation) and plotting data.

fsfrazao commented 8 years ago

Dayo,

I'have been paying with the URL and figured a way to download data for one year at once. Just omit the month tag.

Ex: http://climate.weather.gc.ca/climate_data/bulk_data_e.html?format=csv&stationID=1706&Year=2012&timeframe=2&submit= Download+Data

Will retrieve daily data from 01/01/2012 to 12/31/2012. The day tag is also not necessary. Using this format you don't need to worry about put the months together as we were thinking before.

fsfrazao commented 8 years ago

Also, jupyter notebooks are great for presenting code interactively or even for trying things out as you are developing, but what we really want here is a command line tool, so a .py script is probably a better choice.

djo504 commented 8 years ago

Thanks fabio,

I know that really. So I am going to remove the day tag and month tag and then go ahead with just the year. I am currently trying to let the code print clean output and just the needed columns which is the min date and the needed temperatures to needed by the gdd function

fsfrazao commented 8 years ago

Hey Dayo, I tested your code. It does what it's supposed to do. Great job! How are you planning to do the data cleaning?

One suggestion is to use pandas. pandas.read_csv() can read files from a url.

import pandas as pd
data=pd.read_csv(url,sep=',',skiprows=25) #skip the first 25 rows

This will read the csv file in the url and store it in the data object, so you can clean and save it later.

djo504 commented 8 years ago

Thanks Fabio. I am currently working on that I should commit the improved version soon. My system has been misbehaving, otherwise, I would have finished this somehow. Thanks. Do you have an idea of what exact field exact we need for the output? Or will those just be the variables/Parameters your Gdd function needs?

fsfrazao commented 8 years ago

Hi all,

I uploaded a temp_data.csv file as a suggestion of what the downloaded data could look like after being cleaned by Dayo's script.. We don't have to follow this but it should be useful to get some work done for now.

djo504 commented 8 years ago

Hi fabio Well after trying out and studying pandas, I still imported pandas and tried to skip the first 25 lines to keep the code clean to no avail. However, I have a separate code that can clean the data only if the data has already been downloaded and saved as a file. So I will commit what I have done in a bit as regards the two codes and then. We could after class on Monday and sort this out once and for all(Fabio and I). In the meantime, I will commit the second code that cleans the data from a file and not from the url directly. Then in the meantime, I have some material tutorial about what the whole download process is about. I feel this may be useful for the report for good credit

farayola commented 8 years ago

Great Job Fabio and Dayo! Just a subtle reminder that the GDD is defined with reference to both a base temperature and often, a upper threshold temperature. While the GDD function takes care of the base temperature, i don't think we have factored the upper threshold temperature. Research shows that the upper threshold is 86F or 30C. This implies we have to include the following constraints in the GDD calculation:

If Tmax >86F or 30C, it is set equal to 30C or 86F If Tmax or Tmin < 50F or 10C, it is set to 10C

Kindly let me know your thoughts.

fsfrazao commented 8 years ago

You're absolutely right. I missed that bit when I first wrote the function but it's fixed now. Thanks for pointing that!

DarrenZDL commented 8 years ago

Hi guys, My friend just told me that there may have some problems with the 2015 Toronto data from the government website when download it automatically. It did not show the temperature data after May. So I guess we can check it and select other cities if it's true.

bfsfrank commented 8 years ago

Hi, Dayo! How about your work of the url download now? I think the urlcode.py still have no arguments of the input and output path now. If you finish that function, after u uploading that, please let me know!

Thx!

djo504 commented 8 years ago

hi cj755 done now. it takes two argument the input and the output. The usage has been added to the docstring right away. Thank you