dtherrick / ds-notebooks

Jupyter Notebooks that I'm currently developing in one form or another.
0 stars 1 forks source link

Amazon Reviews #1

Open alfaraday opened 6 years ago

alfaraday commented 6 years ago

Looks good @dtherrick! Nothing stands out as needing to be changed in the example, but we'll need to be thoughtful about the way we turn this walkthrough into a challenge.

We've got the main ask at the top, which is to use Spark to determine if we can predict whether a review is positive or negative based on the language in the review.

Then at the bottom, there are some extension tasks:

Here's where you can go from here:

Think about resampling the overall dataset to better balance positive and negative reviews. Use a different method to tokenize and convert the text to numeric (TF/IDF, etc). Adjust the parameters of your classifier.

Do you want to incorporate those tasks into the challenge itself, or will they just live in the solution?

dtherrick commented 6 years ago

@alfaraday I think I understand where you're going - we need a second document that is the exercise itself. Essentially - this notebook is the solution to the problem.

Addressing your three bullet points above:

As far as the extension tasks, I included those to spur students to think more like a professional data scientist. In other words, this solution should be a first-pass result. The extension tasks are there to help them think about how they would move on to the next steps to generate a production model.

This should be straightforward to pull from the Amazon Jupyter notebook - but I think we should discuss the best delivery approach for it. We can talk through it on Slack if needed also. Thanks!

alfaraday commented 6 years ago

@dtherrick Okay cool, this all sounds good! I'll take a stab at writing the challenge instructions and share it with you before we meet on Friday.