Open alfaraday opened 6 years ago
@alfaraday I think I understand where you're going - we need a second document that is the exercise itself. Essentially - this notebook is the solution to the problem.
Addressing your three bullet points above:
I would expect they use a similar setup to how I've created the notebooks so far: Docker container hosting Spark and Jupyter, and a repository with the actual notebooks they can push to Github:
Assuming they've set up Docker and have a notebooks repository on their machine (we may need to spend some time walking them through setting up that repo); they should use the docker run
command to get a container going on their local machine. I use the following command: docker run -d --rm -p 8888:8888 -v /Users/damian/Documents/Code/ds-notebooks:/home/ds/notebooks thinkfulstudent/pyspark:2.2.1
to pull and run the thinkfulstudent pyspark image locally. The ds-notebooks
repo sits on my machine and contains all my Jupyter notebooks.
The student would then solve the exercise in a Jupyter notebook, push it to Github (or whatever site they use). It's easy enough for a mentor to clone that repo, fire up the same container, and run the code to review it with the student.
As far as the extension tasks, I included those to spur students to think more like a professional data scientist. In other words, this solution should be a first-pass result. The extension tasks are there to help them think about how they would move on to the next steps to generate a production model.
This should be straightforward to pull from the Amazon Jupyter notebook - but I think we should discuss the best delivery approach for it. We can talk through it on Slack if needed also. Thanks!
@dtherrick Okay cool, this all sounds good! I'll take a stab at writing the challenge instructions and share it with you before we meet on Friday.
Looks good @dtherrick! Nothing stands out as needing to be changed in the example, but we'll need to be thoughtful about the way we turn this walkthrough into a challenge.
We've got the main ask at the top, which is to use Spark to determine if we can predict whether a review is positive or negative based on the language in the review.
Then at the bottom, there are some extension tasks:
Do you want to incorporate those tasks into the challenge itself, or will they just live in the solution?