Instance theory model of letter uncertainty effects as a function of position and word length

CrumpLab commented 6 years ago

One goal is to create a model that could account for letter uncertainty influences on mean IKSI across position and word length.

We get to decide whether our paper includes this model, but even if we decide to go for more of a short report direction (just reporting the data analysis), we should be thinking about developing a model(s) for explaining the results. We can decide as we go along whether or not to include the model. In general, data + model = really good paper.

It would be interesting to start with a simple instance theory, a la Gordon Logan's instance theory of automatization (1988).

The basic premise of the model is that an instance-based memory process can determine reaction time. The idea is that there are two routes to completing a response

computing the solution
remembering the solution

If the problem was, what is 88*13? You could compute the solution, and that takes the computation amount of time. However, you store in memory the answer to your solution. So, if you have to do the same problem again, you could either do the computation again (taking computation time to do it), or you could retrieve the solution from memory (taking retrieval time instead).

Memory retrieval speed is modeled in terms of sampling extreme values from a distribution. For example, consider making a response to the stimulus X. Everytime you do it, you lay down a unique memory trace. Memory traces are not created equal, so some of them are easier to remember and some of them are harder to remember. If you've done this say 10 times, then when you did the 11th time, you have the possibility of retrieving on of your 10 available memory traces. Which one will you retrieve first? The fastest one, that is the most easy to retrieve.

So, we can model instance-based memory retrieval in terms of an expanding distribution of memories, each with their own retrieval times. We can simply assume a normal distribution, or another distribution (doesn't matter too much) for the retrieval times. Every time we store a memory for a single response, we add that memory to the pool of instances for that stimulus-response pair. What we add is a retrieval time. So for example, for the first 10 items, we might randomly sample the following retrieval times into the distribution of retrieval times:

400, 150, 200, 210, 400, 300, 175, 180, 190, 210

For each new experience we sample in another retrieval time. Notice what happens. As this distribution grows most of the numbers will not be extreme values, they will be somewhere in the middle. But, as we get lots and lots of instances, we will increase the probability of sampling in a new really fast memory. In the above, the fastest memory is 150. So, with 10 practice attempts, we will be responding at 150 fro the 11th trial, because that memory will always be retrieved first.

If we practice some more, we will eventually sample in a faster retrieval time, say 140. After that, we will be responding at 140.

Over the course of practice, we gradually include memories with faster and more extreme retrieval times. If we plot the function of the fastest memories across practice, we get something that looks like a learning curve, that follows the power law. A very nice, neat and tidy, account of practice based speed-ups in terms of instance-based memory.

It would be worthwhile developing an instance-based model of typing individual letters in specific positions across words of different lengths. We can train the model on typing natural english text. Then at various points in practice we can get model-based estimates of mean typing speed for each position for each word length (just like we do from subjects). Then, we can correlate the model typing times with letter uncertainty as we have done for our subjects. The question is whether the instance-based memory account develops sensitivity to H. My intuition is that it will.

So here is a challenge: let's find out.

CrumpLab commented 6 years ago

I'm writing some instance theory code here:

Instance theory r markdown

wlai0611 commented 6 years ago

I tried to modify the instance theory code by making it apply to typing a word and having the subjects use vectors of letter RTs to type a word in shortest time. More explained in the RMD document in "code" section. I couldn't get it to post here.

wlai0611 commented 6 years ago

It's called instanceTheoryTyping.rmd.

CrumpLab commented 6 years ago

Cool, I'll look at this probably early tomorrow morning.

CrumpLab commented 6 years ago

Nice work, and a good start. Let's try the following problem as a first step toward making an instance model of letter uncertainty effects by position.

Goal: Create an instance model that learns to type letters drawn from high vs. low entropy distributions.

Get two letter probability distributions as follows:

High entropy (max entropy distribution) = Every letter (a-z) occurs equally frequently. We can model this with a 26 length probability distribution, where each element = 1/26, or 0.03846154. We know H is at a max for this distribution, and is ~ 4.7.

Lower entropy distribution (could be any letter distribution where the probabilities of letter occurence are not equal). Let's use the frequency distribution of letters as they occur in natural english. For example, we could use the letter probabilities listed in this wikipedia article: https://en.wikipedia.org/wiki/Letter_frequency

Or, even easier just use the first column in norvig's excel file, ngrams1.csv (it has the total frequency counts for each letter collapsed across position etc.)

Create simulated subjects who have some fixed amount of practice (e.g., have typed 10,000 letters, or 20,000 letters). If we multiply the practice amount by the probability distributions then we get the number of times each simulated subject has experienced each letter, and this also tells us how many traces for each letter each simulated subject has.
Run instance model predictions for how fast each simulated subject should be for typing each letter (given their current number of traces). Do this for the high entropy and low entropy conditions.
Compute mean simulated typing time for simulated subjects in high and low entropy conditions. If instance theory is sensitive to letter uncertainty, we should be able to find evidence that the model has faster mean typing times for low (natural english statistics) entropy letter distributions compared to high (random).
If we can do the above, then we can apply the model to all of the letter distributions for position and word length described by norvig, then we can complete the full model.

wlai0611 commented 6 years ago

what statistic does mean(save_z) give? Is it the slope of the power curve?

wlai0611 commented 6 years ago

nevermind, read the bottom

CrumpLab commented 6 years ago

No prob, it's the exponent in the power formula, changes the slope of the curve.

Just FYI, I've been busy making an instance model to simulate letter uncertainty for all positions and word lengths. It works beautfully, so based on those simulations, we can say that instance theory predicts that typists (if they are learning according to instance theory) should be sensitive to H (across position and word length).

I haven't updated my code yet on git. It's always good if more than one person can get a model working with their own code, that way we know the solution isn't compromise by accidental bug.

Nevertheless, will update soon in case you want to see how I've been approaching it.

CrumpLab commented 6 years ago

Check out this cool graph:

htypinginstance

Here we have mean IKSIs for 350 typists by letter position and word length. We see the standard stuff, first-letter slowing and mid-word slowing

Then, we have measures of letter uncertainty (H) from google's ngram corpus, for each letter position and word length. Looks awfully similar to the typing data. So, we are entertaining the idea that a process sensitive to letter uncertainty could explain first-letter and mid-word slowing.

What could the process be? How about an instance based memory. Bottom panel shows simulations from an instance model trained to type letters with position and word length frequencies consistent with English. Result, simulated letter production times (retrieval times) across position and word length are identical (nearly, r^2 = .99) to letter uncertainty. Instance theory = Information theory. Wow. And, we have a process model with a working explanation of how typing performance could become sensitive to letter uncertainty, thereby at least establishing a plausible theoretical connection between keystrokes and letter uncertainty via instance memory.

footnote: Randy Jamieson has been saying similar things to me (and in his papers) for years about the connections between instance theory and information theory. Nice to see it check out.

wlai0611 commented 6 years ago

Yeah the model looks like it matches our data well. So we can say that the uncertainty caused the midword slowing ?

CrumpLab commented 6 years ago

Good question. Given what we are working with I don't think we will be able to make any strong causal claim. We would need to do an experiment for that and actually manipulate letter uncertainty across positions in different ways. All we have right now are some interesting correlations between our typing data and measures of letter uncertainty.

However, at the theoretical level I think we have a clear logical demonstration that an instance based memory process could produce first-letter slowing and mid-word slowing because of letter uncertainty.

I have some ideas about how to run an experiment as a next step, what would you do?

wlai0611 commented 6 years ago

I would get a list of words with specific positions and lengths and flash them on the screen. Then I can record their typing times at each letter position. What are you thinking of?

CrumpLab commented 6 years ago

We could do that, and then we would be able to measure typing times for each position in each word. We have already done this with the present data, where we put english words in a paragraph on screen, had people type them, then measured all of the typing times. The problem is we didn't experimentally manipulate letter uncertainty for each position across each word. Instead, letter uncertainty naturally varies across letter position for words of different lengths. To run an experiment we would need to vary the letter uncertainties associated with each letter position. This could involve making new letter strings (non-english words) with those properties (one way to skin a cat, and a good way, something we should do, but there are other ways to skin a cat too).

wlai0611 commented 6 years ago

I'm trying to put what you did into words. Let me know what you think. Here is a draft: The objective was to determine if a simulated typist would develop faster retrieval times when typing texts that were very structured, more predictable (like English) than when typing texts that were random and unpredictable (random strings of letters). The simulated typists learned to type according to instance theory; Every time the typist typed a letter, it received a “trace” for that letter. The trace is associated with a retrieval time. The retrieval time is randomly sampled from a normal distribution. Instance theory states that every time The typist types the same letter again, the typist would look through its collection of traces that it accumulated through practice and choose the trace with the lowest retrieval time.
To determine the relationship between the unpredictability of a text and how fast the simulated typist would type it, we made the typist type 2600 letters (amount of practice) in 45 different conditions. Each condition had a different frequency distribution for the 26 letters and a different H, uncertainty value. The H values matched the H values of letters at all the letter positions in one to nine letter words.
In each of the 45 conditions, each letter of the alphabet had a different probability of occurrence.
In each condition, we multiplied the probability of each letter by 2600, to determine how many traces were developed for each letter. For example, in the condition with an H value of 3.299, the probability of typing an “E” was 15.5% so “E” in a letter distribution of 3.299 bits, would have 2600 * 0.155 = 403 traces.
For each letter in each condition, we sampled a quantity that corresponded to the number of the letter’s traces from a normal distribution centered at 500 ms with a mean of 100 ms. We took the minimum retrieval time from the sample. For the sake of accuracy, we repeated this process 100 times and averaged the 100 minimum retrieval times for the letter in the given H condition. This process was repeated for each letter in each H condition.
For each H condition, we correlated retrieval times to H and recorded the R squared.

CrumpLab commented 6 years ago

YES! This is what a first draft of the model section in our paper looks like. good stuff.

wlai0611 commented 6 years ago

Thanks! Can I also get the code for the non_recursive moving function when I try to install Crump, iget error that says Crump package not available for R version 3.5.0

CrumpLab commented 6 years ago

The function is currently defined as

non_recursive_moving<-function(rts){ xsize <- c(4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 25, 30, 35, 50, 100) stds <- c(1.458, 1.68, 1.841, 1.961, 2.05, 2.12, 2.173, 2.22, 2.246, 2.274, 2.31, 2.326, 2.391, 2.41, 2.4305, 2.45, 2.48, 2.5) if(length(rts>=100)){ sdc=2.5 }else{ sdc<-approx(xsize,stds,xout=length(rts))$y } mean_rts<-mean(rts) restricted_rts<-rts[rts > mean_rts - (sd(rts)sdc) & rts < mean_rts + (sd(rts)sdc)]

list(original_rts=rts,restricted=restricted_rts,prop_removed=(1-(length(restricted_rts)/length(rts)))) }

wlai0611 commented 6 years ago

Thanks

wlai0611 commented 6 years ago

Also what should I change in the draft?

CrumpLab commented 6 years ago

About changing things in the draft, a good question. I don't have an answer yet, but I do have some things to say as a heads up.

Once we get a draft of the paper put together we will go into many rounds of revision. The purpose is to refine, clarify, and condense everything that we are saying so it makes sense, and is as easy as possible for readers to read. This will be painstaking, and involve sitting in front of a computer talking with me and Nick about every single sentence word for word. We're not quite there yet.

Now that we have a working model, we need think about the ordering of major topics in our paper. For example, should we present the model first, or the data first, and then the model. This will partly determine how we write these sections. I need to think more about which one should go first, right now I'm leaning toward model first, because the model justifies the plausibility of the hypothesis that we investigate in the data.

I know this isn't particularly helpful in terms of making concrete suggestions for changing any your text.

One thing you could do is take a crack at integrating your text into the .rmd for the draft. For now, let's put the model in the introduction before the data. You can create a new subheader for the model, and then put your description of the model and the findings in that section. Let's see how that looks before editing it.

CrumpLab commented 6 years ago

After some more thinking I came up with a draft outline for the manuscript. I was originally thinking to put instance theory in the introduction, but in my outline I suggest putting it later in the paper after the analysis for Experiment 1. I may waffle on this as I think about it more. Anyway, check out my suggestions for ordering ideas in the introduction here #15

wlai0611 commented 6 years ago

I was reading over two articles from Sylvan Kornblum and Ray Hyman about changing response times while keeping the uncertainty H constant. I replicated their experiments using Dr. Crump's model. I cited their papers in the RMD file.

I fixed the H at 4.77 for 26 items and changed the probability of repetition.

Here is the code:

https://github.com/CrumpLab/EntropyTyping/blob/master/repeatVsRetrievalTimes.Rmd

If it can't run, I uploaded the rmd in "Code"

CrumpLab commented 6 years ago

Walter this is really cool. I've been away over the weekend, so just getting to this now. Will take a closer look in the morning.

wlai0611 commented 6 years ago

I was trying to use the instance model to explore releationship between the H vs mean retrieval times when the repeatability is constant.
I have some code here that is a start:

https://github.com/CrumpLab/EntropyTyping/blob/master/sequentialProbability.Rmd

CrumpLab / EntropyTyping

Instance theory model of letter uncertainty effects as a function of position and word length #14

The function is currently defined as