Sumanshekhar17 / Geophysical-Data-Analysis

This repository consist of some example to get started with python (examples and materials to begin with python is taken from Kristin Thyng's course on python in geosciences) and focus will be on the Geophysical Data Analysis course lectures, taught by Professor John Wilkin and Professor Bob Chant at Rutgers University.
MIT License
1 stars 0 forks source link

Why can't I just go with correlation coefficient value? why there is a need for hypothesis testing in Assignment 2? #1

Closed Sumanshekhar17 closed 1 year ago

Sumanshekhar17 commented 1 year ago

The correlation coefficient provides a measure of the strength and direction of the relationship between two variables, but it does not provide information about the statistical significance of the relationship. The correlation coefficient could be high simply due to chance if the sample size is small or if there is a large amount of noise in the data.

Hypothesis testing provides a way to assess the statistical significance of the relationship between two variables and to determine if the relationship is due to chance or if it is a real and meaningful relationship. The hypothesis test calculates a p-value, which is the probability of observing a correlation as extreme or more extreme than the one calculated from the sample data, assuming that the null hypothesis is true. The null hypothesis in this case is that the two variables are not correlated.

A small p-value (typically less than 0.05) indicates strong evidence against the null hypothesis and supports the alternative hypothesis that the two variables are correlated. A large p-value (typically greater than 0.05) indicates weak evidence against the null hypothesis and supports the conclusion that the relationship between the two variables is not statistically significant and may be due to chance.

In summary, hypothesis testing provides a way to assess the statistical significance of the relationship between two variables, and it is important to perform hypothesis testing in addition to calculating the correlation coefficient because the correlation coefficient alone does not provide information about the statistical significance of the relationship. Hypothesis testing provides a way to determine if the relationship between the two variables is real and meaningful, or if it is due to chance.

Sumanshekhar17 commented 1 year ago

What do you mean by statistical significance? Suppose I have two sets of dataset and I calculated the correlation coefficient and it turned out to be near 1 which sows that it is positively correlated, now shouldn't I just stop my analysis here? Doesn't that mean that to know about the statistical significance have to collect the dataset again and again and calculated the same correlation coefficient each time and then see the trend of correlation coefficient? Is that what p value mean?

Sumanshekhar17 commented 1 year ago

Statistical significance refers to the likelihood that a result is not due to chance and represents a real and meaningful relationship. In the context of a correlation coefficient, statistical significance indicates that the relationship between the two variables is not due to chance and is likely to be a real and meaningful relationship.

Calculating the correlation coefficient between two sets of data gives you a measure of the strength and direction of the relationship between the two variables, but it does not provide information about the statistical significance of the relationship. The correlation coefficient could be high simply due to chance if the sample size is small or if there is a large amount of noise in the data.

To assess the statistical significance of the relationship between two variables, you need to perform hypothesis testing and calculate a p-value. The p-value represents the probability of observing a correlation as extreme or more extreme than the one calculated from the sample data, assuming that the null hypothesis is true. The null hypothesis in this case is that the two variables are not correlated. A small p-value (typically less than 0.05) indicates strong evidence against the null hypothesis and supports the alternative hypothesis that the two variables are correlated. A large p-value (typically greater than 0.05) indicates weak evidence against the null hypothesis and supports the conclusion that the relationship between the two variables is not statistically significant and may be due to chance.