UBC-MDS / majacloud

DSCI 525 Group III - Web and Cloud Computing: Development of rainfall predictor via AWS cloud services.
MIT License
0 stars 1 forks source link

Milestone 1 agenda #1

Closed jianructose closed 3 years ago

jianructose commented 3 years ago

Milestone 1: Tackling big data on your laptop

Overall project goal and data

During this course, you will be working on a team project involving big data. The purpose is to get exposure to working with much larger datasets than you have previously in MDS. You have been assigned to teams of three or four. (See group assignment in Canvas.) Unlike previous project courses, in this course, all of you will be working on the same problem. In particular, you will be building and deploying ensemble machine learning models in the cloud to predict daily rainfall in Australia on a large dataset (~12 GB), where features are outputs of different climate models and the target is the actual rainfall observation.

You will be using this dataset on figshare. The dataset has been put together by Tom. See [this notebook](PUT THE NOTEBOOK LINK) if you're interested in understanding how the data was prepared for you.

At the end of the project, you should have your ML model deployed in cloud for others to use.

During this course, you will work towards this goal step by step in four milestones.



Milestone 1 checklist

Part of the purpose of this milestone is to annoy you by making you work with large data in Pandas and vanilla CSV files. Typically these are not the best for dealing with large data. Along the way, you will also explore some useful tools for working with big data.

rubric={correctness:10}

You can download the data and unzip it manually. But we learned about APIs, and so we can do it in a reproducible way with the requests library, similar to how we did it in class.

There are 5 files in the figshare repo. The one we want is: data.zip

rubric={correctness:10,reasoning:10}

Warning: Some of you might not be able to do it on your laptop. It's fine if you're unable to do it. Just make sure you check memory usage and discuss the reasons why you might not have been able to run this on your laptop.

rubric={correctness:10,reasoning:10}

rubric={correctness:15,reasoning:10}



Specific expectations for this milestone


In the textbox provided on Canvas for the Milestone 1 assignment include:

jianructose commented 3 years ago

Tentative deadlines for this week:

(Can be adjusted after our discussion)

jianructose commented 3 years ago

lets leave it open till the submission. ✌