Clean weather data - remove char type from numerical columns

Just a quick note in case you didn't see this in my earlier code, the two lines below should read the data into pandas directly without the need to go through a csv save/read(unless there's reasons for needing to save csv for other uses), currently I put it into a loop to download weather data over a range of dates: csv_dl = requests.get(csv_url).content df = pd.read_csv(StringIO(csv_dl.decode("utf8")))

I've also made some basic etl cleaning procedures in my weather scrape function, under the "weather_proc" function where I used interpolation to fill in missing values that have a column name containing some reference of temperature (assuming temperature can be reasonably interpolated when not available, any columns with a "date" reference in the name are converted to datetime, and any other column that has missing value gets filled with a zero.

For a full stack experience, we can take a bit of the standard approach to save raw CSVs into S3 storage, curated data into SQL database and use a query/view from there to feed input to our actual modelling workflow.

If we ended up doing some feature engineering or encoding on the data for modelling purposes after our eda, we can decide on whether these transformation happen during the etl or as part of the modelling process later.

Haydenfiege / utility-forecaster

Clean weather data - remove char type from numerical columns #2