UBC-MDS / DSCI-554_MDS_Retention

Survey Design and Experiment for DSCI 554
MIT License
0 stars 2 forks source link

Describe how you plan to analyze the survey results (e.g., what statistical test(s) do you plan to employ?). #3

Closed MikeYuanMY closed 5 years ago

MikeYuanMY commented 5 years ago

answer the question for proposal

Describe how you plan to analyze the survey results (e.g., what statistical test(s) do you plan to employ?).

MikeYuanMY commented 5 years ago

just to brainstorm as a group here. Feel free to provide feedback. Appreciate it :)

here are some ideas i have for this

MikeYuanMY commented 5 years ago

Off the top of my head, if we are measuring retention and have not specified any categorical segregation and just numerically defining it, then we are just averaging CMR. Moving further, we intend to analyze the association between the hours spent and retention then we would want do a linear regression, if we keep Y as the CMR and hence continuous we are looking for multiple linear regression ( do we want to consider GLM)? If we convert the CMR into a categorical variable say grades then it will be Multi-Label Classification. (edited) -- Harjyot

MikeYuanMY commented 5 years ago

since we are only marking them on 5 question and it is multiple choice.. maybe we should not treat the mark as continuous? because you can either get 0, 20, 40, …. 100 %

MikeYuanMY commented 5 years ago

Sayanti [11:02 AM]: Above points which I second : Linear regression Y as CMR Also , I am thinking score as discrete in this case Other things which I am thinking : We have sleep hours and study hours , so there is interaction

MikeYuanMY commented 5 years ago

Harjyot Kaur [11:19 AM] 10 seems a handful, I say we don't specifically state the number of questions we intend to ask to measure retention for now in milestone1. We make the survey and four of us could take the survey with varying number of questions say 5, 7 and 10 and then decide an optimum length.

Harjyot Kaur [11:22 AM] And we could have 10 marks but the question attributing them could be say 7. we could ask a multiple choice question also. For example, given so and so data, what all techniques would you deploy, the person has to read only one question but can fetch multiple marks for giving more than one right answer.

shayne [11:22 AM] I think gym makes sense if we look at Y as a continuous score between 0 and 1

if we don't do this I think it will be difficult to answer our main question

going for regression model for inference, not focusing on prediction accuracy

MikeYuanMY commented 5 years ago
Screen Shot 2019-03-29 at 11 37 39
MikeYuanMY commented 5 years ago

addressed in c8253c0e23b3043f6f24e5c3d723cb2e75179259