emgrasmeder / emc-aws-data-project

Using Python, Boto, Pandas, and maybe Spark to analyze a bit of data on AWS
1 stars 1 forks source link

Welcome to the documentation for my working project to streamline AWS.

This is a work in progress. Don't expect it to be functional

I'm working on the task of rudimentary data analysis on a dataset that that is just-barely-too-large-for-a-single-computer-to-think-about. The goals for this repository are:

-familiarize myself with git
-provide for the folks on the margin, for whom Amazon Web Services is too much work, but just so.
-Utilize most of Git's regular-awesome features, like allowing my product owner to request functionality or report bugs


I'm basically wandering aimlessly / blazing my own trail when it comes to open source projects AND Amazon Web Services, so naturally I've decided to make these blunders public.


Things you might want or need


In order for this repository to be useful for you, you'll probably need to install Python (2.7?) and Boto, actually create an Amazon Web Services account, and make a separate file with your AWS Key ID and Secret Access Key in it. Mine is called "credentials.py" and key() simply returns "AASKKJ23456KJSDJKDF235ESDF" or whatever when I call that function.

Please request functionality, suggest how to make this repository more professional, help me add tests, offer me a job, etc.