RoboTutorLLC / RoboTutor_2019

Main code for RoboTutor. Uploaded 11/20/2018 to XPRIZE from RoboTutorLLC/RoboTutor.
Other
7 stars 4 forks source link

DISCUSS what code, content, and lessons to adapt from MathPlanet #303

Open JackMostow opened 6 years ago

JackMostow commented 6 years ago

Continuation of email discussion with Nirmal -- add yourself to Assignees to subscribe to updates.

JackMostow commented 6 years ago

Including some labels in new issue often causes "There was an error creating your Issue: labels is invalid." Adding them once it's created works OK.

JackMostow commented 6 years ago

Nirmal - I was intrigued by your statement that "We also did this experiment on winning and losing where we found that close wins/losses were likely to be more engaging than far out wins/losses." What constitutes "close wins/losses" in a (presumably single-player) game?

nirmalpatel commented 6 years ago

So we introduced winning and losing in Battleship for this experiment, because most of the Battleship was voluntary play until students quit. It looked like this:

image

So at the end of the level - students were told that they won or lost.

The winning and losing was against a goal - e.g. 70% correct or more means win.

So a close win would be someone getting 70% A far win would be someone getting 100%. A close loss would be getting 65% And a far loss would be getting 30%

And we saw these close/far wins/losses having an effect on player engagement:

image

nirmalpatel commented 6 years ago

We showed the goal like this:

image

JackMostow commented 6 years ago

Interesting! I get the concept; is "remaining items" the number done after the end-of-level report? Thoughts:

  1. Is the reported % the actual % correct or just an independent motivational variable? 2, RoboTutor teaches basic literacy and numeracy, so we can't rely on text messages or use percentages. But we use spoken (Swahili) prompts and integer scores. If we use our mastery threshold for promotion as a goal, winning will advance to the next activity.
  2. What if any actual % correct do you aim for? Someone from HeadSprout told me they aim for 90%.
  3. What win/loss % do you expect (or know) will maximize engagement -- and why?
nirmalpatel commented 6 years ago

I think I lied to you when I said 4 to 8 hours per week... now I'm doing this whenever I can... it's so much fun!

  1. Remaining items are the number items players played after the end-of-level report.
  2. The reported % correct is the actual % correct.

Derek notes in his thesis that win/lost condition had more engagement (mean comparison) over the condition where we just reported % correct and did not say anything about win/lose. There seems to be a third condition (which is called control) about which I don't remember anything, but here's the result:

image

  1. There are different opinions about % correct, but from this giant textbook company people with whom I work closely, I've heard that classroom teachers like to think that when kids get over 70% in some knowledge area, they are out of the danger zone (of course, this depends on whether they got 70% on the difficult items or easy items, and this is where I like to slip into the IRT)

  2. Surprisingly, we did not find goal % to be affecting the overall engagement (total number of trials played.)

image

Numbers 40, 60, 80, and 100 are goal percentages.

One of the reasons for this might be that the You Win! (and you get a trophy with stars flying around) and You Lose! (you see a trophy and it falls off, literally) are washing out the effect that the goal percentages had. I would guess that goal percentages add to the perceived difficulty of the level, and we've found repeatedly that difficulty affects engagement (but not always in the inverted-U way, we found that kids in Battleship liked easy levels in very early experiments.)

There was also level unlocking thing, which was prior to the starting of the level and maybe that drove the student engagement - regardless of the goal:

image

Arrr! I love these Robo Pirates...

JackMostow commented 6 years ago
  1. Emphasis on winning (by rewarding it) worries me if it encourages performance orientation rather than mastery orientation. Does it?

  2. If so, how can we modify it to encourage mastery orientation -- i.e. learning as opposed to high scores?

  3. Would it help to define "winning" by improvement on previous (or previous best) score rather than (only) by reaching a specific performance?

  4. What within-subject experiment(s) could detect the effect of such decisions on engagement?

  5. What would be plausible comparison conditions for each an experiment? E.g.: a. Just display score as at present. b. Verbal praise for surpassing previous (or previous best) score on the same activity. c. When students "lose," reassure them (as DuoLingo does) that they learn from mistakes.

Thanks. - Jack

nirmalpatel commented 6 years ago

Ideas borrowed from our ongoing work with some researchers on intrinsic and extrinsic motivation.

  1. I think you might be right -- rewarding students for getting their scores above certain percentage threshold might act as a good extrinsic motivator.

  2. From our discussions, I found two interesting ways to encourage mastery orientation:

  1. Yeah I think so, defining "winning" by improvement on previous best score might give "winning" a mastery orientation.

4 and 5. Here is a small experimental design to see how encouraging mastery v/s performance orientation might affect students:

In every session, students would randomly get assigned one of the three conditions:

image

image

image

For each of the conditions, we can see how much more or less than average time students spend on different content items (a potential confound is personal preference of students, like liking some poem more than other poems etc.) Derek also operationalized persistence as something similar, I guess.

I don't know if that makes a lot of sense, I'm fairly new to experimental design...

Thanks!

-- Nirmal

nirmalpatel commented 6 years ago

So I sat down with our lead Math Planet (MP) designer Chandradip Rana (he has studied Toy & Game Design from National Institute of Design, India) and we went through many things including his master's thesis to identify design elements of MP and its games that could be increasing the engagement.

I will try to arrange the design elements, lessons, and decisions into certain themes.

Extrinsic motivators:

image

Showing how many collectibles students have collected out of the total can motivate them to collect more things. For example, "Oh! I can still collect 10 more stars from this level!"

Not really sure about avatars -- they might act more like a collectible if there is nothing you can do with them (like feed your avatars burgers and make them fat.)

Bonus round in the Bubble Pop might encourage kids to play the level to get the free goodies at the end.

image

We also had a spinner in a kid app we made called LearnWorld.

Math Planet has a leaderboard:

image

Intrinsic motivators:

Rewarding for persistence and focus (3/5/10 correct in a row):

image

In MP, a robot comes up saying "Great job!", "You did well!", "Awesome!", "You're great!" etc. every time you finish up a planet:

image

Diversity:

Rana tells that diversity could be one of the key elements of MP that causes engagement. Math Planet has 9 different games and 100 different mechanics, which means that every level you enter, you are likely to experience various interactions and learning objectives. Each game has its own aesthetic style and several different mechanics, which make the experience quite unique. For example, in Party Time (PT), we have different way of doing same things: kids can distribute chocolates or they can distribute pastries. Then there is a mechanic for cutting cakes using knife, and there is a slice-o-meter. In a nutshell, having diversity in interactions could be one of the causes behind the engagement in the MP.

Distributing cakes v/s muffins:

image

image

So we can have the same mechanic, and just change the objects within it.

Different types of cake cause kids got bored cutting the same one:

image

Develop more detail for the characters/objects. For example, Battleship Numberline ship also has a partial hit state where smoke is coming out of it:

image

Feedback:

In Party Time (PT), we never tell students that they answered incorrectly. We simply jump to the next question. This happens after doing something incorrect 3 times.

PT characters give emotional feedback.

image

Interactions:

Some interactions could be more difficult than others. For example, tapping is easy, but dragging a bubble onto another might be hard.

image

Possible design changes in RT:

So these are some of the things we discussed.

Cheers!

-- Nirmal

JackMostow commented 6 years ago

Nirmal - That's a lot to chew on! Thoughts: It would be nice to know which of these features are supported by evidence. 100 is a lot of mechanics! Art is very specific; I vie for generality.

nirmalpatel commented 6 years ago

So, there is no evidence for any Math Planet features, because we never did RCTs. All the insights are more personal design decisions of Derek and the design team and need to be tested (during my Ph.D.)

100 is a lot of mechanics! What might be happening is, because we have a big menu of mechanics, even if kids don't like some mechanics, they end up engaging with other mechanics.

I must say, doing game design is like diving into the pit of art while holding onto the book of generality. Somehow, art and aesthetics of our learning games are inseparable from its design. I also remember spending many hours to write the game code that was both efficient and aesthetically pleasing.

We have 2 years' worth of Math Planet data (very close to the DataShop format) which nobody has time to analyze right now (but we can find that time if that helps improve RT.)

I hope my work is useful for our project, please let me know if you need me to focus on something specific.

Thanks!

nirmalpatel commented 6 years ago

I should also mention -- since this is of a general importance to the learning game design community -- that Derek and I (along with some ETS folks) are working on Planout4Edu, an experimentation engine that easily lets us test the effectiveness different design choices in digital learning environments. It is very hard to run experiments and gather evidence about different design choices in production digital learning software. I really wish I could tell you whether or not every little thing in Math Planet worked or not (in terms of learning gains or relevant outcome metrics)... sigh...

At least we published a workshop paper in Bruce McLaren's CHI workshop this year about Planout4Edu. Here's a link to it: https://drive.google.com/file/d/1bjNZqe4NAh2PZutcpuRgGHtEXsCgBb-B/view

JackMostow commented 6 years ago

(Judith and Kevin - I added you to this issue so you can follow the discussion if you like; if not, let me know.)

There's a lot to get my head around! Thoughts:

  1. We assume we need more games because RoboTutor has only 2.5 game mechanics (BubblePop floating, BubblePop rising, and Akira), though kids might perceive some of its other activities as games -- perhaps anything with repeated tapping.

  2. Your games use text prompts and praise. Ours don't because we can't assume our users can read.

  3. We both use expressive recorded speech. Ours comes from two Swahili speakers.

  4. Your games have animated lip-syncing characters. Our Persona is just an animated pair of eyes intended to represent a silent companion that reacts to the tutorial interaction. The current version just blinks sporadically to appear animate, and gazes wherever the action is, to convey attentiveness. We have a repertoire of 11 more expressive silent animations that include eyebrows expression gifs.

    a. We plan to play some of them when emotionally appropriate in response to tutor prompts, as specified in the Expressions column in Swahili translations.

    b. The "Response goals and opportunities sheet" in Mentor Table specifies some tutor events (e.g. correct response, hesitation, finish activity) as opportunities for persona responses to serve particular goals (e.g. increase engagement, foster growth mindset, build rapport).

    c. The "Responses to emotions" sheet in Mentor Table proposes Persona responses to mirror or react to kids' emotions, which we plan to infer from automated detection of their facial expressions.

  5. I doubt that it's feasible to reuse your code, audio assets (other than sound effects), or graphical assets (other than objects, but I don't expect kids in Tanzania to recognize cupcakes or submarines).

  6. I'm more optimistic about adapting your game logic.

  7. I'm even more optimistic about applying some of your design heuristics, especially if supported by evidence, e.g.:

    a. Define "win" and "lose" in terms of achieving a tangible goal adjusted to make wins and losses close.

    b. The optimal number of items in each activity before moving to a different activity is ~10, which already happens to be the length of almost all our item sequences.

    c. Use spoken praise and encouragement to foster intrinsic motivation. RoboTutor already gives spoken and graphical feedback to correct answers. It should also praise improvement over previous performance. We already have narrated prompts for this purpose, but would need to add code to do the bookkeeping necessary to detect improvement and play those prompts when it occurs.

    d. I've been using DuoLingo to learn Swahili, and notice that after incorrect responses it gives encouragement about learning from mistakes. What if anything do your games do to encourage kids when they don't succeed, other than giving them more chances?

    e. We already have variety by randomly selecting the backdrop and background music for each BubblePop activity, and the type of object (e.g. banana, pineapple) created by each tap in a CountX activity.

    i. Replacing bubble popping with a different object and audiovisual effect in BubblePop would require new images, animations, and sound effects -- and they'd still have to be big enough to fit the same text and images,

    ii. Changing vehicles in Akira from racecar to bicycle, motorcycle, bus, elephant, etc. seems fairly easy. Changing the backdrop seems harder. Changing the background music would require finding a suitable one. Changing the layout (# or shape of lanes) seems like too much work.

    iii. I've eschewed adding music or complex backgrounds to tutorial activities like reading, writing, and arithmetic so as not to distract from the prompts or task.

  8. Developing additional game mechanics seems like a lot of work that we'd need to amortize over many different activities.

  9. We need to add or improve tutors to better address educational goals such as teaching place value, arithmetic, word problems, sentence writing, reading comprehension, and phonological awareness. The UI/UX for these tutors needs to be driven by cognitive criteria such as intuitive concrete representations. So a better level at which to apply your knowledge of game design is at: a. Low level: superficial variety such as i. varying objects and sound effects ii. varying backdrop iii. varying background music, if any iv. what else?

    b. Middle level: game logic such as i. allowing multiple attempts ii. praising success, completion, and, improvement iii. giving consolation and encouragement after failures iv. what else?

    c. High level: gamification i. Casting the activity as a contest to win or lose ii. Keeping and reporting a score iii. I don't like extrinsic rewards because they cost our time to develop and kids' time to learn, I prefer intrinsic to extrinsic motivation, and I don't know (but you may) of evidence that they increase learning gains or at least engagement (picking an activity, completing it, and repeating it). iv. What else?

More than enough for now! - Jack

nirmalpatel commented 6 years ago

Okay! Now a lot for me to go through!

  1. Here are the games (or game mechanics) you said you'd like to adopt from Math Planet (April 26 email). I also estimated times to code these games/mechanics if I was doing them ActionScript 3.0 (which is very similar to Java) and Flash (which provided vector based framework for still and animated graphics). So if I was the 2nd year undergrad that I was and I had all the time in the summer:

    • Party Time: "Give Scuba 1 more cupcake than Mr. S" (addition) (8 to 12 days, hard parts are dragging and adjusting cake size as more cakes are added on the plate)
    • Battleship: Pick numeral for specified position on the number line (number estimation) (5 days, tap interactions are easy to do)
    • Crazy Trains: Count the number of objects going by (counting) (5 to 8 days, includes a dynamic animation based on train count, user input is easy to take)
    • Jelly Beans: Add dot bags and arrays (addition) (8 to 12 days, need to dynamically build dot bags and arrays, user input is easy to take)
    • Math Facts: Inverse addition (5 + ? = 8) (addition) (5 days, just need to display question and collect student's answer)
    • Place Value: Tap the digit on the specified place (1s, 10s) (place value) (8 to 10 days, need to dynamically build separate digit graphic objects so that we can detect the tap on each digit)
    • Place Value: Incr/decrement each place to add/subtract 2-digit numbers (place value, addition, subtraction) (8 to 10 days, need to build on top of the digit graphic representation)

    I can help with building the game mechanic pseudo code of any of these games.

    I'm not sure how different RT platform is, so it would be great if someone from the RT team who has worked on ActionScript 3.0 Flash platform before can give me a rough idea about similarities and differences.

  2. I really like the audio feedback that RT has, we will try to include it in the next version of the MP. I think some teachers asked for it too.

  3. Really glad to see kids singing along the poems in RT!

  4. Love the expression GIFs!

    a. Playing expression GIFs during tutor prompts sounds great!

    b. Playing expression GIFs in response to differnt tutor events sounds great!!

    c. Mirroring kids' expressions using expression GIFs might act as a seductive distractor -- i.e. it might cause more engagement but maybe at the cost of less learning gains. In my work with data from gamified platforms, I've seen kids literally 'running after' coin rewards and getting distracted. In some early studies around textbook reading and retention, people found that seductive distractors led to less retention of important details. For a short review of seductive distractors read page 2 of https://link.springer.com/article/10.1007/s40593-015-0044-1

    I also remember you telling me that kids love taking selfies, and if they find the eyes as a reflection of themselves, they might spend the tutor time attempting to manipulate the expression GIFs.

  5. It sure seems tough to re-use the code, but I can surely help with the game pseudo code (which can be a big part of the coding, because once the pseudo code is decided, then you just have to rush and implement it)

  6. So the Math Planet does not have elaborate sequencing logic (except the Math Facts game, where I implemented BKT). But we can totally rip off the game mechanic logic (about how to handle the interactions, when to fire certain events, and how to handle the events etc.)

  7. Yes! We can try out best to leverage results of the Battleship experiments.

    a. So Derek put it in a better way, that mastery orientation has "internal approval" v/s performance orientation has "external approval."

    It seems difficult to give tangible goals and adjust them to make wins and losses seem close. For example, if we say that "give 5 correct to win", then if the kid just does 1 correct, in no way she is going to be convinced that it was a close loss. Maybe, we can say that "give 5 correct to win" and kids can accumulate the correct answers over multiple level plays -- that way, we can encourage them and say that they can win, they just need to come back and try again, and what they have gathered will not go away.

    To decide how many correct answers are required for winning, maybe we can use the probability of mastery formula in a backwards way and see how many correct answers are required to boost mastery probability by a reasonable number (like .2)

    b. 10 items seems like a great level length!

    c. Praise for improvement sounds pretty good to me.

    d. In Battleship, we give extra scaffolding when kids answer incorrectly. Like this:

    And if they answer incorrectly two more times, we just tell them what the correct answer is (oh wow we have data from a 4 year old worked example scaffolding study in a Pittsburgh school that nobody has analyzed, hahaha, that was my HCII summer internship)

    e. RT has bananas and pineapples?! I really need the APK to see these things! @kevindeland If you get some time, can you please point me to debug version install instructions?

    1. Maybe we can add more colors in the bubble color palette. We can also use open source textures and see how they look.

    2. For Akira, maybe we can vary building colors/lighting. We can try to change color of the lanes if that's not too much work (e.g. yellow and black).

    3. Yes, I agree on not adding complex backgrounds and music so that kids can focus on the skill building activity.

  8. So how many game developers are there in the RT team? Maybe if we manage the time correctly, we might be able to build a reasonable amount of new things. I am totally up for helping with the pseudo code of the games. All MP games are written in Starling framework which uses sprite sheets and raster graphics.

  9. Great points! If any new ideas cross my mind, I'll write back.

Phew! I'd rather be at CMU. I'd love to get in touch with the dev team. Do students implement individual games and then theree is someone who integrates them? That's how Math Planet works.

Good day!

-- Nirmal

judithodili commented 6 years ago

Please can you keep these conversations on github? Somehow I'm still getting CC'd. [Jack edited this post to delete the long quoted email.]

JackMostow commented 6 years ago

@judithodili - The conversation is on GitHub. It emails new posts automatically to assignees, At the bottom of the email is a link to "view it on GitHub" so as to avoid long nested emails like yours.

I included you in this issue because you expressed interest in adding games. Should I unassign you?

JackMostow commented 6 years ago

@nirmalpatel - I'm packing for Tanzania so will respond to just a few points:

  1. I love your enthusiasm!

  2. How "baked together" are your graphics? For instance, what would it take to remove all the text?

  3. Are your item sequences "baked in," or are they (like ours) external inputs to parameterized code?

  4. What do you think of responding verbally to different cases as follows:

    • "Winning": beating your previous best score on this activity. "Great! You beat your previous best!"
    • "Tying": matching your previous score. "See if you can do even better next time!"
    • "Close loss": approaching your previous best. "That was close! You almost won!"
    • "Far loss": scoring below your previous best. "Too bad! You can learn from mistakes. Keep trying and you can do better next time!"
    • "Failing": scoring at chance and getting demoted. "That was too hard! Let's try something easier."
    • "Graduating": reaching mastery level on the current task and advancing to the next task/level/topic.
    • "Congratulations! You're ready for something different/harder/new!"
  5. Glad you like Cheul's gifs! They illustrate the minimalist aesthetic I like -- simple yet expressive.

  6. Interesting distinction between expected effects of on-screen persona responding to tutor prompts and events vs. to kids' facial expressions. The rationale for emotional responsiveness in the form of mirroring or otherwise reacting visibly in real time to detected expressions is to addict the kid, build rapport, make the kid care about RoboTutor as a partner, and thereby increase engagement so as to use RoboTutor more often (session frequency), longer each time (session duration), and keep using it (longevity). If reacting to kids' facial expressions turns out to be a seductive distraction, we'll need to react more subtly, e.g. by making tasks or items easier or harder to alleviate frustration or boredom.

Back to packing....

nirmalpatel commented 6 years ago
  1. I love doing this!

  2. Our graphics are not baked together, we have Flash files for each game. We can remove text easily (I'd say few hours per game, and for what we need, 1 hour per game should suffice.)

  3. Item sequences are in XMLs. I can parse them out as required.

  4. I think these prompts are good. I wonder if someone knows how to word them in a more evidence-based way, but they look great to me. They also have a conversational tone.

  5. Minimalism is hard to get right, I hope the GIFs make it to the kids soon!

  6. I guess you can try the mirroring in your beta sites before you decide to release it to the XPRIZE sites?

Have a good trip!

judithodili commented 6 years ago

Hey Jack - please can you unassign me from this thread?

From a development perspective, the main things we need to build this summer are games/apps that teach word problems and listening comprehension... as well as addition with carry and subtraction with borrow. Until those apps are specified and built, I can't handle anything else cognitively...

However, if you have conversations related to addressing those specific domain areas... please tag me so we can get started :)

Judith

nirmalpatel commented 6 years ago

@JackMostow Derek and I were discussing about goal-setting in the RT. I'm not sure how difficult it is to implement it, and whether the goal setting is appropriate for the kids to do (age-wise.) But here are some ideas.

Kids can set their own goals such as following:

  1. I want to help RoboTutor keep his/her energy meter filled up! (The meter goes down when students are 'disengaged')
  2. I want to learn how to read! (The RoboTutor then praises kids for performing well in the reading activities)
  3. I want to learn how to do math!

Duolingo has a daily goal setting option when you sign up:

image

JackMostow commented 6 years ago

Seems over-ambitious for young kids' metacognition, and non-trivial to integrate into stateflow and UI/UX. Ideal candidates to support engagement:

  1. Fit smoothly into existing user experience
  2. Are easy to implement and integrate into RoboTutor's architecture and code
  3. Are likely to help the target population, based on strong empirical and/or theoretical reasons
nirmalpatel commented 6 years ago

Ok, now I get the email notifications!

Yeah, I had a similar concern during the conversation, that the goal-setting activity might not be appropriate for the target population.

I'm going to be diving a lot more into RT data next week -- so feel free to send some questions on!

JackMostow commented 6 years ago

Ignore any data before version 1.8.9.1. What sorts of analyses do you have in mind? Does the data in DataShop suffice? It's parsed from the PERF logs but considerably cleaned up and enhanced. Which if any analyses need the messier VERBOSE logs that include state transitions in the animator graphs?

nirmalpatel commented 6 years ago

Cool I'll stick to the latest version.

I still need access to the datashop dataset.

I did some analysis of animator graph using perf data here https://github.com/RoboTutorLLC/DataAnalysis/blob/master/nirmal_perf_data_usage_analysis_v1.ipynb not sure if you had time to review this before.

nirmalpatel commented 6 years ago

Just brainstorming on the kind of analyses we can do:

  1. Item analysis (classical stats and/or IRT) for different levels -- to find out which items are extremely hard and easy, and whether they need to be modified
  2. A Markov Chain sequence analysis to see how kids transition between different tutor choices that they have
  3. Understanding how learners are pacing through the animator graph -- this might tell us whether we have enough content or not, whether is it possible for some students to never finish, and whether there are any bottlenecks in the animator graph or not
  4. Back button press analysis to see levels where kids bail out a lot (I think Evelyn did this already)
  5. Reaction time analysis for items to identify items where learners take unusually long time to answer
  6. If kids are playing RT for a voluntary amount of time, then we can do a survival analysis to discover factors that make them end their session quickly
JackMostow commented 6 years ago

Comments on https://github.com/RoboTutorLLC/DataAnalysis/blob/master/nirmal_perf_data_usage_analysis_v1.ipynb: Lots of analyses!

  1. Can you display integer times and timestamps rather than 1.524317e+12 | 1.524317e+12?

  2. Please refer to logged events as "log entries" (or events), and use "logs" to refer to log files.

  3. Please exclude data from 1.8.8.1 because it might be affected by bugs in tutor function or logging. For instance, perhaps they explain NA game ID. For data-driven design iteration, we generally care only about data from the newest version.

  4. The activity distributions reflect startup effects, especially in the wake of a new FaceLogin that started kids over at the beginning. Computing completion rates and mean durations will still be weighted by these lopsided distributions, but at least will mitigate startup effects in computing student preferences.

  5. I hope the fields (e.g. gameID, tutor) are documented in the DataShop data set.

  6. Usage statistics over time are perplexing -- good catch! They may reflect delays in uploading data. If you regenerate them, you may find data for days that previously had none. But the huge gaps might have other causes. Can you look up the number of log files uploaded to FROM TABLETS, if any, for those dates? If they're there but not in the data set, they may simply be awaiting parsing, import into DataShop, and export to a file.

  7. What you're calling "animator graph" is actually the matrix for a content area (literacy, stories, math) that specifies the transition between activities in that area from one activity to the next. "Animator graph" specifies the sequence of internal states within an activity, logged in the VERBOSE logs. So the analyses are mislabelled but still informative: they measure traffic through the curriculum. They reflect how far each kid has gotten, and how many times the same kid has repeated an activity, which would be informative to analyze in itself. It would be even nicer to distinguish voluntary repetition of an activity from repeating it after failing to perform it at mastery level (I think 83%, which translates to 9 items in a 10-item sequence).

  8. As you surmised, the same kid may operate under multiple student IDs due to rampant reenrollment.
    Also, kids may sometimes have logged in as another kid. The manual logs of usage show which kid used which tablet when. Bo Jiang has tried to reconstruct the mapping from student IDs to actual students. I doubt we'll ever get a perfect mapping, but it's better than pretending the mapping is one-to-one.

Thanks! - Jack

nirmalpatel commented 6 years ago

Alright! Will incorporate these comments into next analyses. We are working to see how do scores change over time when kids replay the same tutors.

Meanwhile, we have taken a stab at the VERBOSE logs. There are approximately 1.6 million events in them, and in total, we have close to 800 fields among them. Some of the fields are not written correctly (like half of the tutor names are in the field name), but there are no parsing errors. We have flattened the events into one giant data frame.

Here are two main findings:

  1. Most of the logs do not have student ID or session ID in them, making it very hard for us to associate those logs back to the PERF logs. For example, here are the VERBOSE events from bpop.ltr.uc:A..Z.vow.asc.all.stat.noShow.25: https://docs.google.com/spreadsheets/d/1BR5EBp42jQPg0TGATIxtlJLkFeUy_apevJepkWQrd4Q/edit?pli=1#gid=828053024 There are very few student IDs in this file, and most of the rows do not have any. But maybe there is a way to figure out how the animator graph works from this file?

  2. Here are error descriptions messages, and their counts that we found from the VERBOSE log files: https://docs.google.com/spreadsheets/d/1BR5EBp42jQPg0TGATIxtlJLkFeUy_apevJepkWQrd4Q/edit#gid=1601298030

Thanks!

-- Nirmal

JackMostow commented 6 years ago

Nirmal - We should treat distinct topics (e.g. motivation and data analysis) in separate GitHub issues. I don't know how to split GitHub issues, but this one already has so many posts that we should end it here and put future posts in more focused issues.