RoboTutorLLC / RoboTutor_2019

Main code for RoboTutor. Uploaded 11/20/2018 to XPRIZE from RoboTutorLLC/RoboTutor.
Other
7 stars 4 forks source link

Performance logging for reading tutor #285

Open octavpo opened 6 years ago

octavpo commented 6 years ago

From Jack's email:

Can you figure out how to get RoboTutor to log:

  1. each word encountered in READ (or ECHO),
  2. whether RoboTutor gave help on the word, and
  3. whether the kid tapped (for help), swiped (to keep his place), or neither? Android’s gesture detection method (https://developer.android.com/reference/android/view/GestureDetector.html) presumably distinguishes different types of touching. https://developer.android.com/training/gestures/detector.html may also help.
octavpo commented 6 years ago

From another of Jack's emails:

Previous attempts to measure reading performance based on tapping for help ran into two difficulties:

  1. A crash related to thread issues.
  2. Not distinguishing tapping for help from following along in the text.
octavpo commented 6 years ago

After some digging I found that the previous attempt was failing because it was trying to get context information from a place where it wasn't available. We can fix that. I didn't see anything in the previous attempt that was going further than just try to display the current word and the correct/incorrect status. If we want more, I need more details about when and what we want to log. The log record structure is the following;

Name Type Description
timestamp Long timestamp
userId UUID one per each student: chosen by FaceLogin each session
sessionID UUID one per each session: generated by FaceLogin each session
gameId UUID one per each new game started: generated by RoboTutor
language String tutor language
tutorName String name of the tutor e.g. "add_subtract"
levelName String name of level e.g. "asm_26"
taskName String name of task as described in data source e.g. "count by ten"
problemNumber Int incremented number within a game e.g. 1,2,3,4,5
problemName String Generated based on rules for each Tutor type e.g. 2+3=5
totalSubSteps int total number of steps in a problem
substepNumber Int the step within a multi-step problem e.g. 1,2,3
substepProblem int
attemptNumber in attempt count
expectedAnswer String the expected answer from the student
userResponse String the actual answer given from the student
correctness String CORRECT or INCORRECT
distractors String
scaffolding String
promptType String
feedbackType String

A few fields need a description from Kevin.

octavpo commented 6 years ago

At this point I have software that sends a performance log message with the structure above after each word event, both when listening and when speaking (just because that's what the original code was trying to do, not sure we need those). So it has the expected word, the recognized word, the attempt count, correctness status, and all the context information (it just uses "WORD" for level name). How do we go on:

JackMostow commented 6 years ago

Excellent! The (Swahili) Listener and the ASR itself (PocketSphinx) operate in 2 different spaces. "Text space" is the sequence of text words in the sentence, represented as either

For userResponse, can you log the actual ASR output, a sequence of 0 or more speech space words?

For expectedAnswer, please log the unpunctuated word passed to the (Swahili) Listener, because it is the lexical knowledge component we eventually want to trace using knowledge tracing, along with KCs at the syllable and phoneme levels. But the KCs can be defined after the fact.

The correctness field will tell whether the text word was accepted as read correctly, which is not simply whether userResponse = expectedAnswer.

Thanks!

octavpo commented 6 years ago

I have a new version that has some improvements over the first one, although it might not address the note above. So what it does different from the first version is that it detects whether a match happened because of a "virtual" word inserted as help by the tutor vs a "real" word returned by the listener. So for a word generated after a touch it would put TOUCH_GENERATED in userResponse, while for a word generated after two mistakes it would put AUTO_GENERATED in userResponse. In the latter case this comes after it shows the second wrong attempt for the word that was actually recognized, so that's not lost. And actually that's the heuristic it uses to decide between the two, because the program actually runs the same code in both cases.

expectedAnswer is the string that's compared against the listener result, so it's unpunctuated. With the current userResponse correctness is indeed userResponse = expectedAnswer, except for the two tutor generated cases above.

Regarding the idea above about the speech space, I did some digging and things are like this. Before sending a sequence of words to the tutor, the listener has a process of lowest-cost alignment between its sequence of words and the target sequence, during which most of those extras are eliminated. So for instance a "START_word" is only kept if there's no "word" in its sequence, otherwise it's eliminated. And only the cleaned up list is sent to the tutor (where the performance tracing is taking place).

So if we want to record that original sequence, we have to pass it to the tutor too, which is not hard. But I'd suggest if we do that we put it in a different field rather than in userResponse, so we can still have in userResponse the word that the comparison was done against, it might not be easy to guess. We could put the sequence in "distractors" for instance, even if it's not quite that, or please let me know if you have a better idea.

Now I also wonder if those TOUCH_GENERATED and AUTO_GENERATED labels should go in feedbackType rather than userResponse. It would be helpful if Kevin documented those fields.

PS Is it possible to add Evelyn to GitHub so she can read these comments?

kevindeland commented 6 years ago

Octav, I merged your changes into development. I'm not sure if you were completed with your _story_readinglogging branch because you did not open a Pull Request, but I merged them anyways because we are pushing code to Mugeta tonight. If you need to make more changes, please start a new branch off of development so code is current.

JackMostow commented 6 years ago

Octav - A separate ASR_output field seems clearest.