UBC-MDS / DSCI_522_Group_34

MIT License
0 stars 7 forks source link

Add specific method description to proposal in README.md #23

Closed hsmohammed closed 3 years ago

hsmohammed commented 3 years ago

I have read read your proposal and I find your choice of the data set is quite interesting. There is one thing that you need to talk about more, which is what are the methods you intend to use to answer your research questions?

KangboLu commented 3 years ago

@hsmohammed We described in README regarding how we are going to answer the question:

We proposed to sequentially do exploratory data analysis, determine what features and columns to be retrieved to support our hypothesis testing, and attach with a suitable test flavour to verify whether the median number of graffiti per location of Vancouver's downtown area differs from the median number of graffiti per location of Vancouver's Strathcona area.

According to the data visualizations, a suitable estimator for our research question is the median since we are interested in the most common number of graffiti in the two regions and the median is not as sensitive to extreme values as it is mathematically defined by the middle value of a distribution.

And also in the EDA here:

Based on the data visualizations, we can say a suitable estimator for our research question is the median because we are interested in the most common number of the graffiti in the two regions and the central tendency of the distribution is most helpful for this given the shape of this sample distribution is bell-shaped and skewed to the right. The long right tail of the distributions also make the median the suitable test flavor because median is not as sensitive to extreme values as it is mathematically defined by the middle value of a distribution.

Are you referring to the method very specifically like the specific steps? If so, we will add that in milestone 2 since that's what we are doing and we will update the README.md along the way.

hsmohammed commented 3 years ago

@KangboLu the median is your statisitc. What I mean is what method will you be using to compare sample medians and evaluate your hypothesis?

hsmohammed commented 3 years ago

In the milestone 01 instructions you will see this point mentioned in the proposal requirements section.

"Make a plan of how you will analyze the data (report an estimate and confidence intervals? hypothesis test? classification with a decision tree?). Choose something you already learned how to do in another MDS course."

KangboLu commented 3 years ago

We mentioned we would do a hypothesis test and we also mentioned we would use median after EDA.

Here we attempt to conduct a hypothesis testing with a suitable test flavour to answer a statistical research question,

Based on the data visualizations, we can say a suitable estimator for our research question is the median because we are interested in the most common number of graffiti in the two regions and the central tendency of the distribution is most helpful for this given the shape of this sample distribution is bell-shaped and skewed to the right. The long right tail of the distributions also make the median the suitable test flavor because median is not as sensitive to extreme values as it is mathematically defined by the middle value of a distribution.

Do you want us to mention the keyword of "hypothesis test", "median", and "permutation test of difference in median"?