DSCI-310 / data-analysis-review-2021

0 stars 1 forks source link

Submission: 8: Using the Regression to Predict the Student Exam Performance by Study Time #8

Open ttimbers opened 2 years ago

ttimbers commented 2 years ago

Submitting authors: @anamhira47 @tonyliang19 @isabelalucas @snowwang99

Repository: https://github.com/DSCI-310/DSCI-310-Group-8

Abstract/executive summary: In this project, we will explore and predict students' exam performance about Electrical DC Machines based on their study time by using linear regression (LN) and the K-nearest neighbors (K-NN) algorithm. This result could help students gain insight into the necessary study time for specific scores as well as help instructors better understand the performance of students.

As a result of our analysis, we have found the Root mean square prediction error(RMSPE) for our LN model to be 0.281, while the RMSPE of the K-NN model is 0.257. Both types of regression have a prediction error percentage of about 40% (therefore our accuracy is about 60%), although the K-NN model is slightly better than LN model here.

This can be attributed to the fact that exam performance could be affected by other external factors such as health condition, student IQ, stress levels, learning ability and our data set may not be big enough to directly draw a relationship between just study time and exam performance.

The dataset we used was the User Knowledge Modeling Dataset provided by UCL Machine Learning Repository.

Editor: @ttimbers

Reviewer: @rpeng35 @clichyclin @harrysyz99

clichyclin commented 2 years ago

Reviewer: clichyclin

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing:

Review Comments:

First, I want to say congratulations to all of you! I really enjoyed your project and the general structure of your project. I know it's not easy to create such a project so kudos to all of you. Keep it up!

  1. I like your REAMME file and the way it's so easy to follow and how clearly you've explained how to reproduce the results that you got.
  2. This is a minor thing - the names of the scripts/functions are not easy to understand. A good example is "visualize_r" which I believe returns a ggplot point graph. I suggest that you use a name that will be clear as to what the script/function does. For example if a function visualizes a box plot a clear name might be plot_boxplot.
  3. The research question is not quite clear. Your title is clear and I can easily tell what your analysis is doing but the project doesn't have a clear research question. What was done well: the title, how the research can help people (particularly students and how it can be reproduced. Please include a research question. Example can be: Does the amount of time students spend studying affects their exam performance?

Please feel free to reach out to me if something is not clear. (clichyclin@gmail.com)

Again, congratulations!

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

rpeng35 commented 2 years ago

Data analysis review checklist

Reviewer:

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing:

1.5

Review Comments:

  1. Installation instructions were clear and easy to follow, however, after following all the steps and run 'make all', it produces an error on my end. I had to manually get back onto the correct directory in order to make it work
  2. For dependencies, I personally think that it is easier to follow if you include all the dependencies in your README instead of redirecting to another file (yml file in this case)
  3. Same thought as the above comment, the some names of your scripts are too generic. It would be better to give them more specific names.

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

harrysyz99 commented 2 years ago

Data analysis review checklist

Reviewer: @harrysyz99

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing:

2.5

Review Comments:

  1. I think for the prepare_data.r you have multiple tasks for this single r script. I think it might be a better idea that you can separate this single R scipt into different script and each function should only do one thing or you can documented what each function are using for. -For example list_cor function inside the r script I have no iead what this fucntion using for with out take a clear look at the code for that. a simpole documentation will be nice
  2. I think it will better than run the make clean first instead of run make all first, Since we want to remove all possible pre existed document/files, and you definitely dont want those file to influence the reproduceability.
  3. I think some of the cross reference part is not generating properly. I think you could fix them easily.
  4. Some of your depednencies do not have version for it.
    • For example: docopt do not have version for that.

Overall it is a wonderful project I liek it, but there are some minor things need to changed.

Nice work.

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.