CrumpLab / EntropyTyping

A repository for collaborating on our new manuscript investigating how keystroke dynamics conform to information theoretic measures of entropy in the letters people type.
https://crumplab.github.io/EntropyTyping
6 stars 2 forks source link

Expertise and subject sensitivity to H #8

Open CrumpLab opened 6 years ago

CrumpLab commented 6 years ago

Behmer & Crump (2016) used this data set to determine whether individual typists were sensitive to letter, bigram, and trigram frequency. For each typists they measured, mean IKSI for each letter, bigram, and trigram, and then correlated the letter, bigram, and trigram mean IKSIs with letter, bigram, and trigram frequencies in natural english. So, for each subject, three correlation coefficient's (using Spearman's) were obtained for the letter, bigram, and trigram levels. They also plotted these coefficients against each subjects' mean typing time, which served as a proxy for expertise.

We can do the same thing here:

Steps

1) get the correlation between H and mean IKSI for letter position and word length for each subject 2) get mean IKSI for each subject 3) plot the correlations between H and IKSI (position by length) as a function of mean IKSI 4) do a linear regression on above and report findings

Walter has done this. We can report our independent findings here, and discuss what this might mean.

CrumpLab commented 6 years ago

unknown

The plot looks like what Walter got in his analysis, but the numbers are different:

The correlation for me was -0.3202597, R^2 = .1025

Walter, I think one issue might be that you ran the linear regression using r as the Dependent variable, and not the predictor variable. your code was:

expertise.mod1 = lm(r ~ IKSIs, data = correlations)

but try

expertise.mod1 = lm(IKSIs ~ r, data = correlations)

and see what happens

wlai0611 commented 6 years ago

After Nick's pre-processing (issue #10 )

Call: lm(formula = IKSIs ~ r, data = correlations)

Residuals: Min 1Q Median 3Q Max -122.244 -35.472 -6.851 29.423 182.643

Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) 215.005 7.954 27.032 < 2e-16 r -75.246 17.453 -4.311 2.12e-05

Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 51.06 on 344 degrees of freedom Multiple R-squared: 0.05126, Adjusted R-squared: 0.04851 F-statistic: 18.59 on 1 and 344 DF, p-value: 2.122e-05 ########### cor: -0.2264159 r^2 = 0.05126415

CrumpLab commented 6 years ago

I'm getting similar results for this following Nick's pre-processing steps. I eliminated capital letters, rather than whole words, and get cor -.20, so pretty close

wlai0611 commented 6 years ago

I am confused: Shouldn't r be the dependent variable and IKSI be the independent variable? "as a function of mean IKSI"

CrumpLab commented 6 years ago

Oops, yes, I was being confusing and wrong before. If we are predicting pearson_r as a function of mean_iksi, then the formula should be reversed. But, we could also say we were predicting mean_iksi from pearson_r, and then it would stay the same. Because we only have two variables, the correlation should be the same regardless of which way we do it (so my suggestion from before that you should switch the order was pointless, because you should get the same thing no matter what).

Here's my results trying both formulas, they both give the same answers. So, right now I'm getting a correlation of -.257

cor.test(correlation_data$pearson_r,correlation_data$mean_IKSI)

Pearson's product-moment correlation

data: correlation_data$pearson_r and correlation_data$mean_IKSI t = -4.9471, df = 344, p-value = 1.181e-06 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: -0.3535490 -0.1565399 sample estimates: cor -0.2577211

summary(lm(pearson_r ~ mean_IKSI, data = correlation_data))

Call: lm(formula = pearson_r ~ mean_IKSI, data = correlation_data)

Residuals: Min 1Q Median 3Q Max -0.44554 -0.09184 0.00596 0.10833 0.36788

Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.5891506 0.0307569 19.155 < 2e-16 mean_IKSI -0.0008762 0.0001771 -4.947 1.18e-06

Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1521 on 344 degrees of freedom Multiple R-squared: 0.06642, Adjusted R-squared: 0.06371 F-statistic: 24.47 on 1 and 344 DF, p-value: 1.181e-06

cor.test(correlation_data$mean_IKSI,correlation_data$pearson_r,)

Pearson's product-moment correlation

data: correlation_data$mean_IKSI and correlation_data$pearson_r t = -4.9471, df = 344, p-value = 1.181e-06 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: -0.3535490 -0.1565399 sample estimates: cor -0.2577211

summary(lm(mean_IKSI~pearson_r, data = correlation_data))

Call: lm(formula = mean_IKSI ~ pearson_r, data = correlation_data)

Residuals: Min 1Q Median 3Q Max -130.172 -30.679 -5.883 27.525 164.410

Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) 200.946 7.194 27.932 < 2e-16 pearson_r -75.805 15.323 -4.947 1.18e-06

Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 44.75 on 344 degrees of freedom Multiple R-squared: 0.06642, Adjusted R-squared: 0.06371 F-statistic: 24.47 on 1 and 344 DF, p-value: 1.181e-06