North-Seattle-College / ad440-winter2022-tuesday-repo

North Seattle College AD 440 Winter 2020 Cloud Practicum class repoitory
Apache License 2.0
0 stars 4 forks source link

Demo and explain the clustering algorithm for length of feedback #152

Open toddysm opened 2 years ago

toddysm commented 2 years ago
Eftu-Wakjira commented 2 years ago

User Story

As a developer, I am expected to demo the insights that were generated from the floop_dataset, and also try clustering algorithms to group various feedbacks into categories.

Estimated time to complete: 4 hours Actual time to complete: 4 hours

Findings:

The dataset that we had currently, only had two data points:

  1. What was the message
  2. Who sent the message and this information is made available for each feedback.

What I started with is some Explorative Data Analysis on this dataset, to understand the patterns better. We discovered that majority of the conversations consist of only one message, This means that the feedback is most often, one-way in nature, where the student does not revert to the teacher on the feedback submitted. image

This might be because of multiple reasons:

  1. Teacher's feedback is not action-oriented, but just descriptive in nature (Eg: Maybe the teacher says: "Good", "I like this", etc.)
  2. Students have a lower tendency to reply to teachers: This could be a behavioral instinct.

Next steps

It seems that doing clustering with only the text, will not suffice in answering all our questions. A better idea, would be to combine the features like Question/No-Question, Emotion, Sentiment together into the clustering model and then come up with a strategy on how to improve on the conversation length between the teacher and the student.

Instructions for running notebook

The analysis was done on a SageMaker notebook that can be found here

  1. Open the notebook file linked above
  2. Run the cells to regenerate output
  3. Ensure that the output shows same as the screenshot above