Performance logging for reading tutor

octavpo commented 6 years ago

From Jack's email:

Can you figure out how to get RoboTutor to log:

each word encountered in READ (or ECHO),
whether RoboTutor gave help on the word, and
whether the kid tapped (for help), swiped (to keep his place), or neither? Android’s gesture detection method (https://developer.android.com/reference/android/view/GestureDetector.html) presumably distinguishes different types of touching. https://developer.android.com/training/gestures/detector.html may also help.

octavpo commented 6 years ago

From another of Jack's emails:

Previous attempts to measure reading performance based on tapping for help ran into two difficulties:

A crash related to thread issues.
Not distinguishing tapping for help from following along in the text.

octavpo commented 6 years ago

After some digging I found that the previous attempt was failing because it was trying to get context information from a place where it wasn't available. We can fix that. I didn't see anything in the previous attempt that was going further than just try to display the current word and the correct/incorrect status. If we want more, I need more details about when and what we want to log. The log record structure is the following;

Name	Type	Description
timestamp	Long	timestamp
userId	UUID	one per each student: chosen by FaceLogin each session
sessionID	UUID	one per each session: generated by FaceLogin each session
gameId	UUID	one per each new game started: generated by RoboTutor
language	String	tutor language
tutorName	String	name of the tutor e.g. "add_subtract"
levelName	String	name of level e.g. "asm_26"
taskName	String	name of task as described in data source e.g. "count by ten"
problemNumber	Int	incremented number within a game e.g. 1,2,3,4,5
problemName	String	Generated based on rules for each Tutor type e.g. 2+3=5
totalSubSteps	int	total number of steps in a problem
substepNumber	Int	the step within a multi-step problem e.g. 1,2,3
substepProblem	int
attemptNumber	in	attempt count
expectedAnswer	String	the expected answer from the student
userResponse	String	the actual answer given from the student
correctness	String	CORRECT or INCORRECT
distractors	String
scaffolding	String
promptType	String
feedbackType	String

A few fields need a description from Kevin.

octavpo commented 6 years ago

At this point I have software that sends a performance log message with the structure above after each word event, both when listening and when speaking (just because that's what the original code was trying to do, not sure we need those). So it has the expected word, the recognized word, the attempt count, correctness status, and all the context information (it just uses "WORD" for level name). How do we go on:

what extra information we want to add to the record?
are there other points in the cycle when we want to send records, and what should they contain?

JackMostow commented 6 years ago

Excellent! The (Swahili) Listener and the ASR itself (PocketSphinx) operate in 2 different spaces. "Text space" is the sequence of text words in the sentence, represented as either

a string of sentence text including punctuation
a sequence of text words minus punctuation immediately before or after the word but including word-internal punctuation such as -, ', and . in an acronym -- e.g. U.S.A. becomes U.S.A because the final period is post-punctuation.

"Speech space" is a sequence of word tokens output by the ASR, which may have repetitions, omissions, noise symbols, and substitutions of the form START_word, and may also have parenthesized identifiers to distinguish alternative pronunciations of the same word. Alternative pronunciations are much less of an issue for Swahili than for English because Swahili is phonetic, but a START_word may have a separate alternative pronunciation for each truncation of the word.

For userResponse, can you log the actual ASR output, a sequence of 0 or more speech space words?

For expectedAnswer, please log the unpunctuated word passed to the (Swahili) Listener, because it is the lexical knowledge component we eventually want to trace using knowledge tracing, along with KCs at the syllable and phoneme levels. But the KCs can be defined after the fact.

The correctness field will tell whether the text word was accepted as read correctly, which is not simply whether userResponse = expectedAnswer.

Thanks!

octavpo commented 6 years ago

I have a new version that has some improvements over the first one, although it might not address the note above. So what it does different from the first version is that it detects whether a match happened because of a "virtual" word inserted as help by the tutor vs a "real" word returned by the listener. So for a word generated after a touch it would put TOUCH_GENERATED in userResponse, while for a word generated after two mistakes it would put AUTO_GENERATED in userResponse. In the latter case this comes after it shows the second wrong attempt for the word that was actually recognized, so that's not lost. And actually that's the heuristic it uses to decide between the two, because the program actually runs the same code in both cases.

expectedAnswer is the string that's compared against the listener result, so it's unpunctuated. With the current userResponse correctness is indeed userResponse = expectedAnswer, except for the two tutor generated cases above.

Regarding the idea above about the speech space, I did some digging and things are like this. Before sending a sequence of words to the tutor, the listener has a process of lowest-cost alignment between its sequence of words and the target sequence, during which most of those extras are eliminated. So for instance a "START_word" is only kept if there's no "word" in its sequence, otherwise it's eliminated. And only the cleaned up list is sent to the tutor (where the performance tracing is taking place).

So if we want to record that original sequence, we have to pass it to the tutor too, which is not hard. But I'd suggest if we do that we put it in a different field rather than in userResponse, so we can still have in userResponse the word that the comparison was done against, it might not be easy to guess. We could put the sequence in "distractors" for instance, even if it's not quite that, or please let me know if you have a better idea.

Now I also wonder if those TOUCH_GENERATED and AUTO_GENERATED labels should go in feedbackType rather than userResponse. It would be helpful if Kevin documented those fields.

PS Is it possible to add Evelyn to GitHub so she can read these comments?

kevindeland commented 6 years ago

Octav, I merged your changes into development. I'm not sure if you were completed with your _story_readinglogging branch because you did not open a Pull Request, but I merged them anyways because we are pushing code to Mugeta tonight. If you need to make more changes, please start a new branch off of development so code is current.

JackMostow commented 6 years ago

Octav - A separate ASR_output field seems clearest.

RoboTutorLLC / RoboTutor_2019

Performance logging for reading tutor #285