Open nbrosowsky opened 6 years ago
This is good!
Does the capital letter filter remove the whole word in the sense of removing all of the other letters in the word?
We could keep the other other letters, which give us more observations.
More generally, having a pre-processing thread like this is a very good idea.
Yeah, currently it removes the whole word.
If you change "whole_word" to "Letters" it'll just remove the individual letters.
Currently I just eliminated all IKSI above 2000 ms, this is a somewhat arbitrary cut-off. We have adopted a standard practice of using the Van Selst & Jolicouer procedure, so still need to add that.
Always worth noting that outlier elimination is a never ending issue in the sense that we have enormous degrees of freedom try any number of different elimination schemes. The worst, and most useless thing we could do would be to automate the process of trying a million different techniques, and then pick the one that "makes the data better".
We do need to justify the practice that we do adopt. One thing to do is be consistent (for example we should remind ourselves what we did for Behmer & Crump, 2016) and do the same thing here. If our findings depend on our choice of outlier elimination procedure, then we know that something is wrong with our experiment, and we are probably just measuring noise. So, another gut check here is to try a couple reasonable elimination procedures that get rid of the massive numbers (e.g., nobody takes 1000000 seconds to type a letter, those should be removed because one the participants must have left to make a sandwich or something).
Some elimination procedures are:
Cool, changing to "Letters" just does that. I like it when that stuff is easy.
I was looking in the data and found entries whose whole_words have words like "Felis." or "vertebrae." and their word lengths are 1 more than the actual word because of punctuation. So this code corrects their word length
the_data[grepl("[[:punct:]]",substr(the_data$whole_word,nchar(the_data$whole_word) ,nchar(the_data$whole_word))),]$word_lengths=the_data[grepl("[[:punct:]]",substr(the_data$whole_word,nchar(the_data$whole_word) ,nchar(the_data$whole_word))),]$word_lengths-1
the_data[the_data$whole_word=="vertebrae.",]$word_lengths
great, I added that to my pre-processing
I started to dig into the data a little bit and noticed there are probably some things we want to clean up:
I added this to my dplyr pipline to clean that up: