@talhaadnan100 and @akanshaVashisth , this is an interesting prediction problem. I look forward to reading more on your analysis. Here are some improvement points and minor suggestions for your project.
Reasoning
You have actually not mentioned any particular question for your analysis. You have a predictive analysis at hand, but what is your motivation to do this analysis? Do you want to find out which features are the most important in predicting income level? Do you want to find out what whether a person would have cancer given their age and gender? Are you trying to build a model that has a certain level of accuracy? In short, what is the question you are trying to address? You can refer to the issue Tiffany has created in the students repo here for more information on this point.
Keep in mind that your readme is the landing page for someone examining your analysis. Therefore, you can do some additions that would make the readme more informative and intriguing for a visitor. In particular, you can
include a brief introduction about the context and your motivation to do such analysis.
explain the data a bit more in detail, state the variables included, or print the head of the data table in the main readme. Including the link to the data source is a good idea, but still explaining the data in your repository is needed.
I believe you can briefly mention why you chose a decision tree in your analysis. Why do you think this model is suitable for your analysis?
Be careful! Splitting the data into training and test will not ensure that your model does not overfit. However, parameter tuning via cross-validation to avoid overfitting is a more relevant option. Make sure you know the difference while you conduct your analysis.
When you are explaining the expected results, you can use better wording. For example, I don't think you would visualize the accuracy score, but report the accuracy score.
I don't think the raw output of predictions would be necessary to communicate your results. As you mention, reporting on the most important features as well as the accuracy score might be more useful.
Mechanics
The link to the data set is broken.
Always include a link when you point your readers to a specific file and/or folder in your repository.
Minor Suggestions:
I would recommend taking "DSCI_522" out of the name of your project. But if you like it this way, no problem.
I hope this feedback is helpful in improving your project. Please let me know if you have any questions. Good luck!
@talhaadnan100 and @akanshaVashisth , this is an interesting prediction problem. I look forward to reading more on your analysis. Here are some improvement points and minor suggestions for your project.
Reasoning
You have actually not mentioned any particular question for your analysis. You have a predictive analysis at hand, but what is your motivation to do this analysis? Do you want to find out which features are the most important in predicting income level? Do you want to find out what whether a person would have cancer given their age and gender? Are you trying to build a model that has a certain level of accuracy? In short, what is the question you are trying to address? You can refer to the issue Tiffany has created in the students repo here for more information on this point.
Keep in mind that your readme is the landing page for someone examining your analysis. Therefore, you can do some additions that would make the readme more informative and intriguing for a visitor. In particular, you can
I believe you can briefly mention why you chose a decision tree in your analysis. Why do you think this model is suitable for your analysis?
Be careful! Splitting the data into training and test will not ensure that your model does not overfit. However, parameter tuning via cross-validation to avoid overfitting is a more relevant option. Make sure you know the difference while you conduct your analysis.
When you are explaining the expected results, you can use better wording. For example, I don't think you would visualize the accuracy score, but report the accuracy score.
I don't think the raw output of predictions would be necessary to communicate your results. As you mention, reporting on the most important features as well as the accuracy score might be more useful.
Mechanics
Minor Suggestions:
I hope this feedback is helpful in improving your project. Please let me know if you have any questions. Good luck!